Systems and methods for displaying a user interface

ABSTRACT

A method for displaying a user interface on an electronic device is described. The method includes presenting a user interface. The user interface includes a coordinate system. The coordinate system corresponds to physical coordinates based on sensor data. The method also includes displaying at least a target audio signal and an interfering audio signal on the user interface.

RELATED APPLICATIONS

This application is related to and claims priority from U.S. ProvisionalPatent Application Ser. No. 61/713,447 filed Oct. 12, 2012, for “SYSTEMSAND METHODS FOR MAPPING COORDINATES,” U.S. Provisional PatentApplication Ser. No. 61/714,212 filed Oct. 15, 2012, for “SYSTEMS ANDMETHODS FOR MAPPING COORDINATES,” U.S. Provisional Application Ser. No.61/624,181 filed Apr. 13, 2012, for “SYSTEMS, METHODS, AND APPARATUS FORESTIMATING DIRECTION OF ARRIVAL,” U.S. Provisional Application Ser. No.61/642,954, filed May 4, 2012, for “SYSTEMS, METHODS, AND APPARATUS FORESTIMATING DIRECTION OF ARRIVAL” and U.S. Provisional Application No.61/726,336, filed Nov. 14, 2012, for “SYSTEMS, METHODS, AND APPARATUSFOR ESTIMATING DIRECTION OF ARRIVAL.”

TECHNICAL FIELD

The present disclosure relates generally to electronic devices. Morespecifically, the present disclosure relates to systems and methods fordisplaying a user interface.

BACKGROUND

In the last several decades, the use of electronic devices has becomecommon. In particular, advances in electronic technology have reducedthe cost of increasingly complex and useful electronic devices. Costreduction and consumer demand have proliferated the use of electronicdevices such that they are practically ubiquitous in modern society. Asthe use of electronic devices has expanded, so has the demand for newand improved features of electronic devices. More specifically,electronic devices that perform functions faster, more efficiently orwith higher quality are often sought after.

Some electronic devices (e.g., cellular phones, smart phones, computers,etc.) use audio or speech signals. These electronic devices may codespeech signals for storage or transmission. For example, a cellularphone captures a user's voice or speech using a microphone. Themicrophone converts an acoustic signal into an electronic signal. Thiselectronic signal may then be formatted (e.g., coded) for transmissionto another device (e.g., cellular phone, smart phone, computer, etc.),for playback or for storage.

Noisy audio signals may pose particular challenges. For example,competing audio signals may reduce the quality of a desired audiosignal. As can be observed from this discussion, systems and methodsthat improve audio signal quality in an electronic device may bebeneficial.

SUMMARY

A method for displaying a user interface on an electronic device isdescribed. The method includes presenting a user interface. The userinterface includes a coordinate system. The coordinate systemcorresponds to physical coordinates based on sensor data. The methodalso includes displaying at least a target audio signal and aninterfering audio signal on the user interface. The target audio signalmay include a voice signal. The reference plane may be horizontal. Thephysical coordinates may be earth coordinates.

The method may include displaying a directionality of at least one ofthe target audio signal and the interfering audio signal captured by atleast one microphone. The method may include displaying at least oneicon corresponding to at least one of the target audio signal and theinterfering audio signal. The method may include passing the targetaudio signal. The method may include attenuating the interfering audiosignal. The method may include aligning at least a part of the userinterface with a reference plane.

Aligning at least a part of the user interface may include mapping atwo-dimensional polar plot into a three-dimensional display space. Thecoordinate system may maintain an orientation independent of electronicdevice orientation.

The method may include recognizing an audio signature. The method mayalso include looking up the audio signature in a database. The methodmay additionally include obtaining identification informationcorresponding to the audio signature. The method may further includedisplaying the identification information on the user interface. Theidentification information may be an image of a person corresponding tothe audio signature.

An electronic device is also described. The electronic device includes adisplay. The display presents a user interface. The user interfaceincludes a coordinate system. The coordinate system corresponds tophysical coordinates based on sensor data. The display displays at leasta target audio signal and an interfering audio signal on the userinterface.

A computer-program product for displaying a user interface is alsodescribed. The computer-program product includes a non-transitorytangible computer-readable medium with instructions. The instructionsinclude code for causing an electronic device to present a userinterface. The user interface includes a coordinate system. Thecoordinate system corresponds to physical coordinates based on sensordata. The instructions also include code for causing the electronicdevice to display at least a target audio signal and an interferingaudio signal on the user interface.

An apparatus for displaying a user interface is also described. Theapparatus includes means for presenting a user interface. The userinterface includes a coordinate system. The coordinate systemcorresponds to physical coordinates based on sensor data. The apparatusalso includes means for displaying at least a target audio signal and aninterfering audio signal on the user interface.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows multiple views of a multi-microphone handset;

FIG. 2A shows a far-field model of plane wave propagation relative to amicrophone pair;

FIG. 2B shows multiple microphone pairs in a linear array;

FIG. 3A shows plots of unwrapped phase delay vs. frequency for fourdifferent directions of arrival (DOAs);

FIG. 3B shows plots of wrapped phase delay vs. frequency for the samefour different directions of arrival as depicted in FIG. 3A;

FIG. 4A shows an example of measured phase delay values and calculatedvalues for two DOA candidates;

FIG. 4B shows a linear array of microphones arranged along the topmargin of a television screen;

FIG. 5A shows an example of calculating DOA differences for a frame;

FIG. 5B shows an example of calculating a DOA estimate;

FIG. 5C shows an example of identifying a DOA estimate for eachfrequency;

FIG. 6A shows an example of using calculated likelihoods to identify abest microphone pair and best DOA candidate for a given frequency;

FIG. 6B shows an example of likelihood calculation;

FIG. 7 shows an example of bias removal;

FIG. 8 shows another example of bias removal;

FIG. 9 shows an example of an anglogram that plots source activitylikelihood at the estimated DOA over frame and frequency;

FIG. 10A shows an example of a speakerphone application;

FIG. 10B shows a mapping of pair-wise DOA estimates to a 360° range inthe plane of the microphone array;

FIGS. 11A-B show an ambiguity in the DOA estimate;

FIG. 11C shows a relation between signs of observed DOAs and quadrantsof an x-y plane;

FIGS. 12A-12D show an example in which the source is located above theplane of the microphones;

FIG. 13A shows an example of microphone pairs along non-orthogonal axes;

FIG. 13B shows an example of use of the array of FIG. 13A to obtain aDOA estimate with respect to the orthogonal x and y axes;

FIG. 13C illustrates a relation between arrival of parallel wavefrontsat microphones of different arrays for examples of two different DOAs;

FIGS. 14A-14B show examples of pair-wise normalized beamformer/nullbeamformers (BFNFs) for a two-pair microphone array;

FIG. 15A shows a two-pair microphone array;

FIG. 15B shows an example of a pair-wise normalized minimum variancedistortionless response (MVDR) BFNF;

FIG. 16A shows an example of a pair-wise BFNF for frequencies in whichthe matrix A^(H)A is not ill-conditioned;

FIG. 16B shows examples of steering vectors;

FIG. 17 shows a flowchart of one example of an integrated method ofsource direction estimation as described herein;

FIGS. 18-31 show examples of practical results of DOA estimation, sourcediscrimination, and source tracking as described herein;

FIG. 32A shows a telephone design, and FIGS. 32B-32D show use of such adesign in various modes with corresponding visualization displays;

FIG. 33A shows a flowchart for a method M10 according to a generalconfiguration;

FIG. 33B shows an implementation T12 of task T10;

FIG. 33C shows an implementation T14 of task T10;

FIG. 33D shows a flowchart for an implementation M20 of method M10;

FIG. 34A shows a flowchart for an implementation M25 of method M20;

FIG. 34B shows a flowchart for an implementation M30 of method M10;

FIG. 34C shows a flowchart for an implementation M100 of method M30;

FIG. 35A shows a flowchart for an implementation M110 of method M100;

FIG. 35B shows a block diagram of an apparatus A5 according to a generalconfiguration;

FIG. 35C shows a block diagram of an implementation A10 of apparatus A5;

FIG. 35D shows a block diagram of an implementation A15 of apparatusA10;

FIG. 36A shows a block diagram of an apparatus MF5 according to ageneral configuration;

FIG. 36B shows a block diagram of an implementation MF10 of apparatusMF5;

FIG. 36C shows a block diagram of an implementation MF15 of apparatusMF10;

FIG. 37A illustrates a use of a device to represent a three-dimensionaldirection of arrival in a plane of the device;

FIG. 37B illustrates an intersection of the cones of confusion thatrepresent respective responses of microphone arrays havingnon-orthogonal axes to a point source positioned outside the plane ofthe axes;

FIG. 37C illustrates a line of intersection of the cones of FIG. 37B;

FIG. 38A shows a block diagram of an audio preprocessing stage;

FIG. 38B shows a block diagram of a three-channel implementation of anaudio preprocessing stage;

FIG. 39A shows a block diagram of an implementation of an apparatus thatincludes means for indicating a direction of arrival;

FIG. 39B shows an example of an ambiguity that results from theone-dimensionality of a DOA estimate from a linear array;

FIG. 39C illustrates one example of a cone of confusion;

FIG. 40 shows an example of source confusion in a speakerphoneapplication in which three sources are located in different respectivedirections relative to a device having a linear microphone array;

FIG. 41A shows a 2-D microphone array that includes two microphone pairshaving orthogonal axes;

FIG. 41B shows a flowchart of a method according to a generalconfiguration that includes tasks;

FIG. 41C shows an example of a DOA estimate shown on a display;

FIG. 42A shows one example of correspondences between the signs of 1-Destimates and corresponding quadrants of the plane defined by arrayaxes;

FIG. 42B shows another example of correspondences between the signs of1-D estimates and corresponding quadrants of the plane defined by arrayaxes;

FIG. 42C shows a correspondence between the four values of the tuple(sign(θ_(x)), sign(θ_(y))) and the quadrants of the plane;

FIG. 42D shows a 360-degree display according to an alternate mapping;

FIG. 43A shows an example that is similar to FIG. 41A but depicts a moregeneral case in which the source is located above the x-y plane;

FIG. 43B shows another example of a 2-D microphone array whose axesdefine an x-y plane and a source that is located above the x-y plane;

FIG. 43C shows an example of such a general case in which a point sourceis elevated above the plane defined by the array axes;

FIGS. 44A-44D show a derivation of a conversion of (θ_(x), θ_(y)) intoan angle in the array plane;

FIG. 44E illustrates one example of a projection p and an angle ofelevation;

FIG. 45A shows a plot obtained by applying an alternate mapping;

FIG. 45B shows an example of intersecting cones of confusion associatedwith responses of linear microphone arrays having non-orthogonal axes xand r to a common point source;

FIG. 45C shows the lines of intersection of cones;

FIG. 46A shows an example of a microphone array;

FIG. 46B shows an example of obtaining a combined directional estimatein the x-y plane with respect to orthogonal axes x and y withobservations (θ_(x), θ_(r)) from an array as shown in FIG. 46A;

FIG. 46C illustrates one example of a projection;

FIG. 46D illustrates one example of determining a value from thedimensions of a projection vector;

FIG. 46E illustrates another example of determining a value from thedimensions of a projection vector;

FIG. 47A shows a flowchart of a method according to another generalconfiguration that includes instances of tasks;

FIG. 47B shows a flowchart of an implementation of a task that includessubtasks;

FIG. 47C illustrates one example of an apparatus with components forperforming functions corresponding to FIG. 47A;

FIG. 47D illustrates one example of an apparatus including means forperforming functions corresponding to FIG. 47A;

FIG. 48A shows a flowchart of one implementation of a method thatincludes a task;

FIG. 48B shows a flowchart for an implementation of another method;

FIG. 49A shows a flowchart of another implementation of a method;

FIG. 49B illustrates one example of an indication of an estimated angleof elevation relative to a display plane;

FIG. 49C shows a flowchart of such an implementation of another methodthat includes a task;

FIGS. 50A and 50B show examples of a display before and after arotation;

FIGS. 51A and 51B show other examples of a display before and after arotation;

FIG. 52A shows an example in which a device coordinate system E isaligned with the world coordinate system;

FIG. 52B shows an example in which a device is rotated and the matrix Fthat corresponds to an orientation;

FIG. 52C shows a perspective mapping, onto a display plane of a device,of a projection of a DOA onto the world reference plane;

FIG. 53A shows an example of a mapped display of the DOA as projectedonto the world reference plane;

FIG. 53B shows a flowchart of such another implementation of a method;

FIG. 53C illustrates examples of interfaces including a linear sliderpotentiometer, a rocker switch and a wheel or knob;

FIG. 54A illustrates one example of a user interface;

FIG. 54B illustrates another example of a user interface;

FIG. 54C illustrates another example of a user interface;

FIGS. 55A and 55B show a further example in which an orientation sensoris used to track an orientation of a device;

FIG. 56 is a block diagram illustrating one configuration of anelectronic device in which systems and methods for mapping a sourcelocation may be implemented;

FIG. 57 is a flow diagram illustrating one configuration of a method formapping a source location;

FIG. 58 is a block diagram illustrating a more specific configuration ofan electronic device in which systems and methods for mapping a sourcelocation may be implemented;

FIG. 59 is a flow diagram illustrating a more specific configuration ofa method for mapping a source location;

FIG. 60 is a flow diagram illustrating one configuration of a method forperforming an operation based on the mapping;

FIG. 61 is a flow diagram illustrating another configuration of a methodfor performing an operation based on the mapping;

FIG. 62 is a block diagram illustrating one configuration of a userinterface in which systems and methods for displaying a user interfaceon an electronic device may be implemented;

FIG. 63 is a flow diagram illustrating one configuration of a method fordisplaying a user interface on an electronic device;

FIG. 64 is a block diagram illustrating one configuration of a userinterface in which systems and methods for displaying a user interfaceon an electronic device may be implemented;

FIG. 65 is a flow diagram illustrating a more specific configuration ofa method for displaying a user interface on an electronic device;

FIG. 66 illustrates examples of the user interface for displaying adirectionality of at least one audio signal;

FIG. 67 illustrates another example of the user interface for displayinga directionality of at least one audio signal;

FIG. 68 illustrates another example of the user interface for displayinga directionality of at least one audio signal;

FIG. 69 illustrates another example of the user interface for displayinga directionality of at least one audio signal;

FIG. 70 illustrates another example of the user interface for displayinga directionality of at least one audio signal;

FIG. 71 illustrates an example of a sector selection feature of the userinterface;

FIG. 72 illustrates another example of the sector selection feature ofthe user interface;

FIG. 73 illustrates another example of the sector selection feature ofthe user interface;

FIG. 74 illustrates more examples of the sector selection feature of theuser interface;

FIG. 75 illustrates more examples of the sector selection feature of theuser interface;

FIG. 76 is a flow diagram illustrating one configuration of a method forediting a sector;

FIG. 77 illustrates examples of a sector editing feature of the userinterface;

FIG. 78 illustrates more examples of the sector editing feature of theuser interface;

FIG. 79 illustrates more examples of the sector editing feature of theuser interface;

FIG. 80 illustrates more examples of the sector editing feature of theuser interface;

FIG. 81 illustrates more examples of the sector editing feature of theuser interface;

FIG. 82 illustrates an example of the user interface with a coordinatesystem oriented independent of electronic device orientation;

FIG. 83 illustrates another example of the user interface with thecoordinate system oriented independent of electronic device orientation;

FIG. 84 illustrates another example of the user interface with thecoordinate system oriented independent of electronic device orientation;

FIG. 85 illustrates another example of the user interface with thecoordinate system oriented independent of electronic device orientation;

FIG. 86 illustrates more examples of the user interface with thecoordinate system oriented independent of electronic device orientation;

FIG. 87 illustrates another example of the user interface with thecoordinate system oriented independent of electronic device orientation;

FIG. 88 is a block diagram illustrating another configuration of theuser interface in which systems and methods for displaying a userinterface on an electronic device may be implemented;

FIG. 89 is a flow diagram illustrating another configuration of a methodfor displaying a user interface on an electronic device;

FIG. 90 illustrates an example of the user interface coupled to adatabase;

FIG. 91 is a flow diagram illustrating another configuration of a methodfor displaying a user interface on an electronic device;

FIG. 92 is a block diagram illustrating one configuration of a wirelesscommunication device in which systems and methods for mapping a sourcelocation may be implemented;

FIG. 93 illustrates various components that may be utilized in anelectronic device; and

FIG. 94 illustrates another example of a user interface.

DETAILED DESCRIPTION

The 3rd Generation Partnership Project (3GPP) is a collaboration betweengroups of telecommunications associations that aims to define a globallyapplicable 3rd generation (3G) mobile phone specification. 3GPP LongTerm Evolution (LTE) is a 3GPP project aimed at improving the UniversalMobile Telecommunications System (UMTS) mobile phone standard. The 3GPPmay define specifications for the next generation of mobile networks,mobile systems and mobile devices.

It should be noted that, in some cases, the systems and methodsdisclosed herein may be described in terms of one or morespecifications, such as the 3GPP Release-8 (Rel-8), 3GPP Release-9(Rel-9), 3GPP Release-10 (Rel-10), LTE, LTE-Advanced (LTE-A), GlobalSystem for Mobile Communications (GSM), General Packet Radio Service(GPRS), Enhanced Data Rates for GSM Evolution (EDGE), Time DivisionLong-Term Evolution (TD-LTE), Time Division Synchronous Code DivisionMultiple Access (TD-SCDMA), Frequency-Division Duplexing Long-TermEvolution (FDD-LTE), UMTS, GSM EDGE Radio Access Network (GERAN), GlobalPositioning System (GPS), etc. However, at least some of the conceptsdescribed herein may be applied to other wireless communication systems.For example, the term electronic device may be used to refer to a UserEquipment (UE). Furthermore, the term base station may be used to referto at least one of the terms Node B, Evolved Node B (eNB), Home EvolvedNode B (HeNB), etc.

Unless expressly limited by its context, the term “signal” is usedherein to indicate any of its ordinary meanings, including a state of amemory location (or set of memory locations) as expressed on a wire,bus, or other transmission medium. Unless expressly limited by itscontext, the term “generating” is used herein to indicate any of itsordinary meanings, such as computing or otherwise producing. Unlessexpressly limited by its context, the term “calculating” is used hereinto indicate any of its ordinary meanings, such as computing, evaluating,estimating and/or selecting from a plurality of values. Unless expresslylimited by its context, the term “obtaining” is used to indicate any ofits ordinary meanings, such as calculating, deriving, receiving (e.g.,from an external device), and/or retrieving (e.g., from an array ofstorage elements). Unless expressly limited by its context, the term“selecting” is used to indicate any of its ordinary meanings, such asidentifying, indicating, applying, and/or using at least one, and fewerthan all, of a set of two or more. Unless expressly limited by itscontext, the term “determining” is used to indicate any of its ordinarymeanings, such as deciding, establishing, concluding, calculating,selecting and/or evaluating. Where the term “comprising” is used in thepresent description and claims, it does not exclude other elements oroperations. The term “based on” (as in “A is based on B”) is used toindicate any of its ordinary meanings, including the cases (i) “derivedfrom” (e.g., “B is a precursor of A”), (ii) “based on at least” (e.g.,“A is based on at least B”) and, if appropriate in the particularcontext, (iii) “equal to” (e.g., “A is equal to B” or “A is the same asB”). Similarly, the term “in response to” is used to indicate any of itsordinary meanings, including “in response to at least.” Unless otherwiseindicated, the terms “at least one of A, B, and C” and “one or more ofA, B, and C” indicate “A and/or B and/or C.”

References to a “location” of a microphone of a multi-microphone audiosensing device indicate the location of the center of an acousticallysensitive face of the microphone, unless otherwise indicated by thecontext. The term “channel” is used at times to indicate a signal pathand at other times to indicate a signal carried by such a path,according to the particular context. Unless otherwise indicated, theterm “series” is used to indicate a sequence of two or more items. Theterm “logarithm” is used to indicate the base-ten logarithm, althoughextensions of such an operation to other bases are within the scope ofthis disclosure. The term “frequency component” is used to indicate oneamong a set of frequencies or frequency bands of a signal, such as asample (or “bin”) of a frequency domain representation of the signal(e.g., as produced by a fast Fourier transform) or a subband of thesignal (e.g., a Bark scale or mel scale subband).

Unless indicated otherwise, any disclosure of an operation of anapparatus having a particular feature is also expressly intended todisclose a method having an analogous feature (and vice versa), and anydisclosure of an operation of an apparatus according to a particularconfiguration is also expressly intended to disclose a method accordingto an analogous configuration (and vice versa). The term “configuration”may be used in reference to a method, apparatus and/or system asindicated by its particular context. The terms “method,” “process,”“procedure,” and “technique” are used generically and interchangeablyunless otherwise indicated by the particular context. A “task” havingmultiple subtasks is also a method. The terms “apparatus” and “device”are also used generically and interchangeably unless otherwise indicatedby the particular context. The terms “element” and “module” aretypically used to indicate a portion of a greater configuration. Unlessexpressly limited by its context, the term “system” is used herein toindicate any of its ordinary meanings, including “a group of elementsthat interact to serve a common purpose.”

Any incorporation by reference of a portion of a document shall also beunderstood to incorporate definitions of terms or variables that arereferenced within the portion, where such definitions appear elsewherein the document, as well as any figures referenced in the incorporatedportion. Unless initially introduced by a definite article, an ordinalterm (e.g., “first,” “second,” “third,” etc.) used to modify a claimelement does not by itself indicate any priority or order of the claimelement with respect to another, but rather merely distinguishes theclaim element from another claim element having a same name (but for useof the ordinal term). Unless expressly limited by its context, each ofthe terms “plurality” and “set” is used herein to indicate an integerquantity that is greater than one.

A. Systems, Methods and Apparatus for Estimating Direction of Arrival

A method of processing a multichannel signal includes calculating, foreach of a plurality of different frequency components of themultichannel signal, a difference between a phase of the frequencycomponent in each of a first pair of channels of the multichannelsignal, to obtain a plurality of phase differences. This method alsoincludes estimating an error, for each of a plurality of candidatedirections, between the candidate direction and a vector that is basedon the plurality of phase differences. This method also includesselecting, from among the plurality of candidate directions, a candidatedirection that corresponds to the minimum among the estimated errors. Inthis method, each of said first pair of channels is based on a signalproduced by a corresponding one of a first pair of microphones, and atleast one of the different frequency components has a wavelength that isless than twice the distance between the microphones of the first pair.

It may be assumed that in the near-field and far-field regions of anemitted sound field, the wavefronts are spherical and planar,respectively. The near-field may be defined as that region of space thatis less than one wavelength away from a sound receiver (e.g., amicrophone array). Under this definition, the distance to the boundaryof the region varies inversely with frequency. At frequencies of twohundred, seven hundred, and two thousand hertz, for example, thedistance to a one-wavelength boundary is about 170, forty-nine, andseventeen centimeters, respectively. It may be useful instead toconsider the near-field/far-field boundary to be at a particulardistance from the microphone array (e.g., fifty centimeters from amicrophone of the array or from the centroid of the array, or one meteror 1.5 meters from a microphone of the array or from the centroid of thearray).

Various configurations are now described with reference to the Figures,where like reference numbers may indicate functionally similar elements.The systems and methods as generally described and illustrated in theFigures herein could be arranged and designed in a wide variety ofdifferent configurations. Thus, the following more detailed descriptionof several configurations, as represented in the Figures, is notintended to limit scope, as claimed, but is merely representative of thesystems and methods. Features and/or elements depicted in a Figure maybe combined with at least one features and/or elements depicted in atleast one other Figures.

FIG. 1 shows an example of a multi-microphone handset H100 (e.g., amulti-microphone device) that includes a first microphone pair MV10-1,MV10-3 whose axis is in a left-right direction of a front face of thedevice, and a second microphone pair MV10-1, MV10-2 whose axis is in afront-back direction (i.e., orthogonal to the front face). Such anarrangement may be used to determine when a user is speaking at thefront face of the device (e.g., in a browse-talk mode). The front-backpair may be used to resolve an ambiguity between front and backdirections that the left-right pair typically cannot resolve on its own.In some implementations, the handset H100 may include one or moreloudspeakers LS10, L20L, LS20R, a touchscreen TS10, a lens L10 and/orone or more additional microphones ME10, MR10.

In addition to a handset as shown in FIG. 1, other examples of audiosensing devices that may be implemented to include a multi-microphonearray and to perform a method as described herein include portablecomputing devices (e.g., laptop computers, notebook computers, netbookcomputers, ultra-portable computers, tablet computers, mobile Internetdevices, smartbooks, smartphones, etc.), audio- or video-conferencingdevices, and display screens (e.g., computer monitors, television sets).

A device as shown in FIG. 1 may be configured to determine the directionof arrival (DOA) of a source signal by measuring a difference (e.g., aphase difference) between the microphone channels for each frequency binto obtain an indication of direction, and averaging the directionindications over all bins to determine whether the estimated directionis consistent over all bins. The range of frequency bins that may beavailable for tracking is typically constrained by the spatial aliasingfrequency for the microphone pair. This upper limit may be defined asthe frequency at which the wavelength of the signal is twice thedistance, d, between the microphones. Such an approach may not supportaccurate tracking of source DOA beyond one meter and typically maysupport only a low DOA resolution. Moreover, dependence on a front-backpair to resolve ambiguity may be a significant constraint on themicrophone placement geometry, as placing the device on a surface mayeffectively occlude the front or back microphone. Such an approach alsotypically uses only one fixed pair for tracking.

It may be desirable to provide a generic speakerphone application suchthat the multi-microphone device may be placed arbitrarily (e.g., on atable for a conference call, on a car seat, etc.) and track and/orenhance the voices of individual speakers. Such an approach may becapable of dealing with an arbitrary target speaker position withrespect to an arbitrary orientation of available microphones. It mayalso be desirable for such an approach to provide instantaneousmulti-speaker tracking/separating capability. Unfortunately, the currentstate of the art is a single-microphone approach.

It may also be desirable to support source tracking in a far-fieldapplication, which may be used to provide solutions for tracking sourcesat large distances and unknown orientations with respect to themulti-microphone device. The multi-microphone device in such anapplication may include an array mounted on a television or set-top box,which may be used to support telephony. Examples include the array of aKinect device (Microsoft Corp., Redmond, Wash.) and arrays from Skype(Microsoft Skype Division) and Samsung Electronics (Seoul, KR). Inaddition to the large source-to-device distance, such applicationstypically also suffer from a bad signal-to-interference-noise ratio(SINR) and room reverberation.

It is a challenge to provide a method for estimating a three-dimensionaldirection of arrival (DOA) for each frame of an audio signal forconcurrent multiple sound events that is sufficiently robust underbackground noise and reverberation. Robustness can be obtained bymaximizing the number of reliable frequency bins. It may be desirablefor such a method to be suitable for arbitrarily shaped microphone arraygeometry, such that specific constraints on microphone geometry may beavoided. A pair-wise 1-D approach as described herein can beappropriately incorporated into any geometry.

The systems and methods disclosed herein may be implemented for such ageneric speakerphone application or far-field application. Such anapproach may be implemented to operate without a microphone placementconstraint. Such an approach may also be implemented to track sourcesusing available frequency bins up to Nyquist frequency and down to alower frequency (e.g., by supporting use of a microphone pair having alarger inter-microphone distance). Rather than being limited to a singlepair for tracking, such an approach may be implemented to select a bestpair among all available pairs. Such an approach may be used to supportsource tracking even in a far-field scenario, up to a distance of threeto five meters or more, and to provide a much higher DOA resolution.Other potential features include obtaining an exact 2-D representationof an active source. For best results, it may be desirable that eachsource is a sparse broadband audio source, and that each frequency binis mostly dominated by no more than one source.

FIG. 33A shows a flowchart for a method M10 according to a generalconfiguration that includes tasks T10, T20 and T30. Task T10 calculatesa difference between a pair of channels of a multichannel signal (e.g.,in which each channel is based on a signal produced by a correspondingmicrophone). For each among a plurality K of candidate directions, taskT20 calculates a corresponding directional error that is based on thecalculated difference. Based on the K directional errors, task T30selects a candidate direction.

Method M10 may be configured to process the multichannel signal as aseries of segments. Typical segment lengths range from about five or tenmilliseconds to about forty or fifty milliseconds, and the segments maybe overlapping (e.g., with adjacent segments overlapping by 25% or 50%)or non-overlapping. In one particular example, the multichannel signalis divided into a series of non-overlapping segments or “frames,” eachhaving a length of ten milliseconds. In another particular example, eachframe has a length of twenty milliseconds. A segment as processed bymethod M10 may also be a segment (i.e., a “subframe”) of a largersegment as processed by a different operation, or vice versa.

Examples of differences between the channels include a gain differenceor ratio, a time difference of arrival, and a phase difference. Forexample, task T10 may be implemented to calculate the difference betweenthe channels of a pair as a difference or ratio between correspondinggain values of the channels (e.g., a difference in magnitude or energy).FIG. 33B shows such an implementation T12 of task T10.

Task T12 may be implemented to calculate measures of the gain of asegment of the multichannel signal in the time domain (e.g., for each ofa plurality of subbands of the signal) or in a frequency domain (e.g.,for each of a plurality of frequency components of the signal in atransform domain, such as a fast Fourier transform (FFT), discretecosine transform (DCT), or modified DCT (MDCT) domain). Examples of suchgain measures include, without limitation, the following: totalmagnitude (e.g., sum of absolute values of sample values), averagemagnitude (e.g., per sample), root mean square (RMS) amplitude, medianmagnitude, peak magnitude, peak energy, total energy (e.g., sum ofsquares of sample values), and average energy (e.g., per sample).

In order to obtain accurate results with a gain-difference technique, itmay be desirable for the responses of the two microphone channels to becalibrated relative to each other. It may be desirable to apply alow-pass filter to the multichannel signal such that calculation of thegain measure is limited to an audio-frequency component of themultichannel signal.

Task T12 may be implemented to calculate a difference between gains as adifference between corresponding gain measure values for each channel ina logarithmic domain (e.g., values in decibels) or, equivalently, as aratio between the gain measure values in a linear domain. For acalibrated microphone pair, a gain difference of zero may be taken toindicate that the source is equidistant from each microphone (i.e.,located in a broadside direction of the pair), a gain difference with alarge positive value may be taken to indicate that the source is closerto one microphone (i.e., located in one endfire direction of the pair),and a gain difference with a large negative value may be taken toindicate that the source is closer to the other microphone (i.e.,located in the other endfire direction of the pair).

In another example, task T10 from FIG. 33A may be implemented to performa cross-correlation on the channels to determine the difference (e.g.,calculating a time-difference-of-arrival based on a lag between channelsof the multichannel signal).

In a further example, task T10 is implemented to calculate thedifference between the channels of a pair as a difference between thephase of each channel (e.g., at a particular frequency component of thesignal). FIG. 33C shows such an implementation T14 of task T10. Asdiscussed below, such calculation may be performed for each among aplurality of frequency components.

For a signal received by a pair of microphones directly from a pointsource in a particular direction of arrival (DOA) relative to the axisof the microphone pair, the phase delay differs for each frequencycomponent and also depends on the spacing between the microphones. Theobserved value of the phase delay at a particular frequency component(or “bin”) may be calculated as the inverse tangent (also called thearctangent) of the ratio of the imaginary term of the complex FFTcoefficient to the real term of the complex FFT coefficient.

As shown in FIG. 2A, the phase delay value Δφ_(f) for a source S01 forat least one microphone MC10, MC20 at a particular frequency, f, may berelated to source DOA under a far-field (i.e., plane-wave) assumption as

${{\Delta\phi}_{f} = {2\pi \; f\frac{d\; \sin \; \theta}{c}}},$

where d denotes the distance between the microphones MC10, MC20 (inmeters), θ denotes the angle of arrival (in radians) relative to adirection that is orthogonal to the array axis, f denotes frequency (inHz), and c denotes the speed of sound (in m/s). As will be describedbelow, the DOA estimation principles described herein may be extended tomultiple microphone pairs in a linear array (e.g., as shown in FIG. 2B).For the ideal case of a single point source with no reverberation, theratio of phase delay to frequency Δφ_(f) will have the same value

$2\pi \; f\frac{d\; \sin \; \theta}{c}$

over all frequencies. As discussed in more detail below, the DOA, θ,relative to a microphone pair is a one-dimensional measurement thatdefines the surface of a cone in space (e.g., such that the axis of thecone is the axis of the array).

Such an approach is typically limited in practice by the spatialaliasing frequency for the microphone pair, which may be defined as thefrequency at which the wavelength of the signal is twice the distance dbetween the microphones. Spatial aliasing causes phase wrapping, whichputs an upper limit on the range of frequencies that may be used toprovide reliable phase delay measurements for a particular microphonepair.

FIG. 3A shows plots of unwrapped phase delay vs. frequency for fourdifferent DOAs D10, D20, D30, D40. FIG. 3B shows plots of wrapped phasedelay vs. frequency for the same DOAs D10, D20, D30, D40, where theinitial portion of each plot (i.e., until the first wrapping occurs) areshown in bold. Attempts to extend the useful frequency range of phasedelay measurement by unwrapping the measured phase are typicallyunreliable.

Task T20 may be implemented to calculate the directional error in termsof phase difference. For example, task T20 may be implemented tocalculate the directional error at frequency f, for each of an inventoryof K DOA candidates, where 1≦k≦K, as a squared difference e_(ph) _(—)_(f) _(—) _(k)=(Δφ_(ob) _(—) _(f)−Δφ_(k) _(—) _(f))² (alternatively, anabsolute difference e_(ph) _(—) _(f) _(—) _(k)=|Δφ_(ob) _(—) _(f)−Δφ_(k)_(—) _(f)|) between the observed phase difference and the phasedifference corresponding to the DOA candidate.

Instead of phase unwrapping, a proposed approach compares the phasedelay as measured (e.g., wrapped) with pre-calculated values of wrappedphase delay for each of an inventory of DOA candidates. FIG. 4A showssuch an example that includes angle vs. frequency plots of the (noisy)measured phase delay values MPD10 and the phase delay values PD10, PD20for two DOA candidates of the inventory (solid and dashed lines), wherephase is wrapped to the range of pi to minus pi. The DOA candidate thatis best matched to the signal as observed may then be determined bycalculating a corresponding directional error for each DOA candidate,θ_(i), and identifying the DOA candidate value that corresponds to theminimum among these directional errors. Such a directional error may becalculated, for example, as an error, e_(ph) _(—) _(k), between thephase delay values, Δφ_(k) _(—) _(f), for the k-th DOA candidate and theobserved phase delay values Δφ_(ob) _(—) _(f). In one example, theerror, e_(ph) _(—) _(k), is expressed as |Δφ_(ob) _(—) _(f)−Δφ_(k) _(—)_(f)|_(f) ² over a desired range or other set F of frequency components,i.e. as the sum

$e_{ph\_ k} = {\sum\limits_{f \in F}\; ( {{\Delta\phi}_{ob\_ f} - {\Delta\phi}_{k\_ f}} )^{2}}$

of the squared differences between the observed and candidate phasedelay values over F. The phase delay values, Δφ_(k) _(—) _(f), for eachDOA candidate, θ_(k), may be calculated before run-time (e.g., duringdesign or manufacture), according to known values of c and d and thedesired range of frequency components f, and retrieved from storageduring use of the device. Such a pre-calculated inventory may beconfigured to support a desired angular range and resolution (e.g., auniform resolution, such as one, two, five, six, ten, or twelve degrees;or a desired non-uniform resolution) and a desired frequency range andresolution (which may also be uniform or non-uniform).

It may be desirable to calculate the directional error (e.g., e_(ph)_(—) _(f), e_(ph) _(—) _(k)) across as many frequency bins as possibleto increase robustness against noise. For example, it may be desirablefor the error calculation to include terms from frequency bins that arebeyond the spatial aliasing frequency. In a practical application, themaximum frequency bin may be limited by other factors, which may includeavailable memory, computational complexity, strong reflection by a rigidbody (e.g., an object in the environment, a housing of the device) athigh frequencies, etc.

A speech signal is typically sparse in the time-frequency domain. If thesources are disjoint in the frequency domain, then two sources can betracked at the same time. If the sources are disjoint in the timedomain, then two sources can be tracked at the same frequency. It may bedesirable for the array to include a number of microphones that is atleast equal to the number of different source directions to bedistinguished at any one time. The microphones may be omnidirectional(e.g., as may be typical for a cellular telephone or a dedicatedconferencing device) or directional (e.g., as may be typical for adevice such as a set-top box).

Such multichannel processing is generally applicable, for example, tosource tracking for speakerphone applications. Such a technique may beused to calculate a DOA estimate for a frame of the receivedmultichannel signal. Such an approach may calculate, at each frequencybin, the error for each candidate angle with respect to the observedangle, which is indicated by the phase delay. The target angle at thatfrequency bin is the candidate having the minimum error. In one example,the error is then summed across the frequency bins to obtain a measureof likelihood for the candidate. In another example, one or more of themost frequently occurring target DOA candidates across all frequencybins is identified as the DOA estimate (or estimates) for a given frame.

Such a method may be applied to obtain instantaneous tracking results(e.g., with a delay of less than one frame). The delay is dependent onthe FFT size and the degree of overlap. For example, for a 512-point FFTwith a 50% overlap and a sampling frequency of 16 kilohertz (kHz), theresulting 256-sample delay corresponds to sixteen milliseconds. Such amethod may be used to support differentiation of source directionstypically up to a source-array distance of two to three meters, or evenup to five meters.

The error may also be considered as a variance (i.e., the degree towhich the individual errors deviate from an expected value). Conversionof the time-domain received signal into the frequency domain (e.g., byapplying an FFT) has the effect of averaging the spectrum in each bin.This averaging is even more obvious if a subband representation is used(e.g., mel scale or Bark scale). Additionally, it may be desirable toperform time-domain smoothing on the DOA estimates (e.g., by applying arecursive smoother, such as a first-order infinite-impulse-responsefilter). It may be desirable to reduce the computational complexity ofthe error calculation operation (e.g., by using a search strategy, suchas a binary tree, and/or applying known information, such as DOAcandidate selections from one or more previous frames).

Even though the directional information may be measured in terms ofphase delay, it is typically desired to obtain a result that indicatessource DOA. Consequently, it may be desirable to implement task T20 tocalculate the directional error at frequency f, for each of an inventoryof K DOA candidates, in terms of DOA rather than in terms of phasedelay.

An expression of directional error in terms of DOA may be derived byexpressing wrapped phase delay at frequency f (e.g., the observed phasedelay, Δφ_(ob) _(—) _(f), as a function Ψ_(f) _(—) _(wr) of the DOA, θ,of the signal, such as

$ {{\Psi_{f\_ wr}(\theta)} = {{{mod}( {{{{- 2}\pi \; f\frac{d\; \sin \; \theta}{c}} + \pi},{2\pi}} )} - \pi}} ).$

We assume that this expression is equivalent to a correspondingexpression for unwrapped phase delay as a function of DOA, such as

${{\Psi_{f\_ un}(\theta)} = {{- 2}\pi \; f\frac{d\; \sin \; \theta}{c}}},$

except near discontinuities that are due to phase wrapping. Thedirectional error, e_(ph) _(—) _(f) _(—) _(k), may then be expressed interms of observed DOA, θ_(ob) and candidate DOA, θ_(k), as e_(ph) _(—)_(f) _(—) _(k)=|Ψ_(f) _(—) _(wr)(θ_(ob))−Ψ_(f) _(—) _(wr)(θ_(k))|≡|Ψ_(f)_(—) _(un)(θ_(ob))−Ψ_(f) _(—) _(un)(θ_(k))| or e_(ph) _(—) _(f) _(—)_(k)=(Ψ_(f) _(—) _(wr)(θ_(ob))−Ψ_(f) _(—) _(wr)(θ_(k)))²≡(Ψ_(f) _(—)_(un)(θ_(ob))−Ψ_(f) _(—) _(un)(θ_(k)))², where the difference betweenthe observed and candidate phase delay at frequency f is expressed interms of observed DOA at frequency f, θ_(ob) _(—) _(f) and candidateDOA, θ_(k), as

${{\Psi_{f\_ un}( \theta_{ob} )} - {\Psi_{f\_ un}( \theta_{k} )}} = {\frac{{- 2}\pi \; {fd}}{c}{( {{\sin \; \theta_{ob\_ f}} - {\sin \; \theta_{k}}} ).}}$

A directional error, e_(ph) _(—) _(k), across F may then be expressed interms of observed DOA, θ_(ob), and candidate DOA, θ_(k), as e_(ph) _(—)_(f) _(—) _(k)=|Ψ_(f) _(—) _(wr)(θ_(ob))−Ψ_(f) _(—) _(wr)(θ_(k))|_(f)²≡|Ψ_(f) _(—) _(un)(θ_(ob))−Ψ_(f) _(—) _(un)(θ_(k))|_(f) ².

We perform a Taylor series expansion on this result to obtain thefollowing first-order approximation:

${{\frac{{- 2}\pi \; {fd}}{c}( {{\sin \; \theta_{ob\_ f}} - {\sin \; \theta_{k}}} )} \approx {( {\theta_{ob\_ f} - \theta_{k}} )\frac{{- 2}\pi \; {fd}}{c}\cos \; \theta_{k}}},$

which is used to obtain an expression of the difference between the DOAθ_(ob) _(—) _(f) as observed at frequency f and DOA candidate θ_(k):

$( {\theta_{ob\_ f} - \theta_{k}} ) \cong {\frac{{\Psi_{f\_ un}( \theta_{ob} )} - {\Psi_{f\_ un}( \theta_{k} )}}{\frac{2\pi \; {fd}}{c}\cos \; \theta_{k}}.}$

This expression may be used (e.g., in task T20), with the assumedequivalence of observed wrapped phase delay to unwrapped phase delay, toexpress the directional error in terms of DOA (e_(DOA) _(—) _(f) _(—k) ,e_(DOA) _(—) _(k)) rather than phase delay (e_(ph) _(—) _(f) _(—k) ,e_(ph) _(—) _(k)):

${e_{{DOA\_ f}{\_ k}} = {( {\theta_{b} - \theta_{k}} )^{2} \cong \frac{( {{\Psi_{f\_ wr}( \theta_{ob} )} - {\Psi_{f\_ wr}( \theta_{k} )}} )^{2}}{( {\frac{2\pi \; {fd}}{c}\cos \; \theta_{k}} )^{2}}}},{e_{DOA\_ k} = {{{\theta_{b} - \theta_{k}}}_{f}^{2} \cong \frac{{{{\Psi_{f\_ wr}( \theta_{ob} )} - {\Psi_{f\_ wr}( \theta_{k} )}}}_{f}^{2}}{{{\frac{2\pi \; {fd}}{c}\cos \; \theta_{k}}}_{f}^{2}}}},$

where the values of └Ψ_(f) _(—) _(wr)(θ_(ob)), Ψ_(f) _(—) _(wr)(θ_(k))┘are defined as └Δψ_(ob) _(—) _(f),Δψ_(k) _(—) _(f)┘.

To avoid division with zero at the endfire directions (θ=+/−90°), it maybe desirable to implement task T20 to perform such an expansion using asecond-order approximation instead, as in the following:

${{\theta_{ob} - \theta_{k}}} \cong \{ \begin{matrix}{{{{- C}/B}},\mspace{31mu} {\theta_{i} = {0({broadside})}}} \\{{{\frac{{- B} + \sqrt{B^{2} - {4\; {AC}}}}{2\; A},\mspace{31mu} {otherwise}}},}\end{matrix} $

where A=(πfd sin θ_(k))/c, B=(−2πfd cos θ_(k))/c, and C=−(Ψ_(f) _(—)_(un)(θ_(ob))−Ψ_(f) _(—) _(un)(θ_(k))). As in the first-order exampleabove, this expression may be used, with the assumed equivalence ofobserved wrapped phase delay to unwrapped phase delay, to express thedirectional error in terms of DOA as a function of the observed andcandidate wrapped phase delay values.

FIGS. 5A-5C depict a plurality of frames 502. As shown in FIG. 5A, adirectional error based on a difference between observed and candidateDOA for a given frame of the received signal may be calculated in suchmanner (e.g., by task T20) at each of a plurality of frequencies f ofthe received microphone signals (e.g., ∀fεF) and for each of a pluralityof DOA candidates θ_(k). It may be desirable to implement task T20 toperform a temporal smoothing operation on each directional error eaccording to an expression such as e_(s)(n)=βe_(s)(n+1)+(1−β)e(n) (alsoknown as a first-order IIR or recursive filter), where e_(s)(n−1)denotes the smoothed directional error for the previous frame, e_(s)(n)denotes the current unsmoothed value of the directional error, e_(s)(n)denotes the current smoothed value of the directional error, and β is asmoothing factor whose value may be selected from the range from zero(no smoothing) to one (no updating). Typical values for smoothing factorβ include 0.1, 0.2, 0.25, 0.3, 0.4 and 0.5. It is typical, but notnecessary, for such an implementation of task T20 to use the same valueof β to smooth directional errors that correspond to different frequencycomponents. Similarly, it is typical, but not necessary, for such animplementation of task T20 to use the same value of β to smoothdirectional errors that correspond to different candidate directions. Asdemonstrated in FIG. 5B, a DOA estimate for a given frame may bedetermined by summing the squared differences for each candidate acrossall frequency bins in the frame to obtain a directional error (e.g.,e_(ph) _(—) _(k) or e_(DOA) _(—) _(k)) and selecting the DOA candidatehaving the minimum error. Alternatively, as demonstrated in FIG. 5C,such differences may be used to identify the best-matched (i.e. minimumsquared difference) DOA candidate at each frequency. A DOA estimate forthe frame may then be determined as the most frequent DOA across allfrequency bins.

Based on the directional errors, task T30 selects a candidate directionfor the frequency component. For example, task T30 may be implemented toselect the candidate direction associated with the lowest among the Kdirectional errors produced by task T20. In another example, task T30 isimplemented to calculate a likelihood based on each directional errorand to select the candidate direction associated with the highestlikelihood.

As shown in FIG. 6B, an error term 604 may be calculated for eachcandidate angle 606, I, and each of a set F of frequencies for eachframe 608, k. It may be desirable to indicate a likelihood of sourceactivity in terms of a calculated DOA difference or error term 604. Oneexample of such a likelihood L may be expressed, for a particular frame,frequency and angle, as

${L( {i,f,k} )} = {\frac{1}{{{\theta_{ob} - \theta_{i}}}_{f,k}^{2}}.}$

For this expression, an extremely good match at a particular frequencymay cause a corresponding likelihood to dominate all others. To reducethis susceptibility, it may be desirable to include a regularizationterm λ, as in the following expression:

${L( {i,f,k} )} = {\frac{1}{{{\theta_{ob} - \theta_{i}}}_{f,k}^{2} + \lambda}.}$

Speech tends to be sparse in both time and frequency, such that a sumover a set of frequencies F may include results from bins that aredominated by noise. It may be desirable to include a bias term β, as inthe following expression:

${L( {i,f,k} )} = {\frac{1}{{{\theta_{ob} - \theta_{i}}}_{f,k}^{2} + \lambda} - {\beta.}}$

The bias term, which may vary over frequency and/or time, may be basedon an assumed distribution of the noise (e.g., Gaussian). Additionallyor alternatively, the bias term may be based on an initial estimate ofthe noise (e.g., from a noise-only initial frame). Additionally oralternatively, the bias term may be updated dynamically based oninformation from noise-only frames, as indicated, for example, by avoice activity detection module. FIGS. 7 and 8 show examples of plots oflikelihood before and after bias removal, respectively. In FIG. 7, theframe number 710, an angle of arrival 712 and an amplitude 714 of asignal are illustrated. Similarly, in FIG. 8, the frame number 810, anangle of arrival 812 and an amplitude 814 of a signal are illustrated.

The frequency-specific likelihood results may be projected onto a(frame, angle) plane (e.g., as shown in FIG. 8) to obtain a DOAestimation per frame

$\theta_{est\_ k} = {\max_{i}{\sum\limits_{f \in F}\; {L( {i,f,k} )}}}$

that is robust to noise and reverberation because only target-dominantfrequency bins contribute to the estimate. In this summation, terms inwhich the error is large may have values that approach zero and thusbecome less significant to the estimate. If a directional source isdominant in some frequency bins, the error value at those frequency binsmay be nearer to zero for that angle. Also, if another directionalsource is dominant in other frequency bins, the error value at the otherfrequency bins may be nearer to zero for the other angle.

The likelihood results may also be projected onto a (frame, frequency)plane as shown in the bottom panel 918 of FIG. 9 to indicate likelihoodinformation per frequency bin, based on directional membership (e.g.,for voice activity detection). The bottom panel 918 shows, for eachfrequency and frame, the corresponding likelihood for the estimated DOA

$( {{e.g.},{{argmax}_{i}{\sum\limits_{f \in F}\; {L( {i,f,k} )}}}} ).$

This likelihood may be used to indicate likelihood of speech activity.Additionally or alternatively, such information may be used, forexample, to support time- and/or frequency-selective masking of thereceived signal by classifying frames and/or frequency componentsaccording to their directions of arrival.

An anglogram representation, as shown in the bottom panel 918 of FIG. 9,is similar to a spectrogram representation. As shown in the top panel916 of FIG. 9, a spectrogram may be obtained by plotting, at each frame,the magnitude of each frequency component. An anglogram may be obtainedby plotting, at each frame, a likelihood of the current DOA candidate ateach frequency.

FIG. 33D shows a flowchart for an implementation M20 of method M10 thatincludes tasks T100, T200 and T300. Such a method may be used, forexample, to select a candidate direction of arrival of a source signal,based on information from a pair of channels of a multichannel signal,for each of a plurality F of frequency components of the multichannelsignal. For each among the plurality F of frequency components, taskT100 calculates a difference between the pair of channels. Task T100 maybe implemented, for example, to perform a corresponding instance of taskT10 (e.g., task T12 or T14) for each among the plurality F of frequencycomponents.

For each among the plurality F of frequency components, task T200calculates a plurality of directional errors. Task T200 may beimplemented to calculate K directional errors for each frequencycomponent. For example, task T200 may be implemented to perform acorresponding instance of task T20 for each among the plurality F offrequency components. Alternatively, task T200 may be implemented tocalculate K directional errors for each among one or more of thefrequency components, and to calculate a different number (e.g., more orless than K) directional errors for each among a different one or moreamong the frequency components.

For each among the plurality F of frequency components, task T300selects a candidate direction. Task T300 may be implemented to perform acorresponding instance of task T30 for each among the plurality F offrequency components.

The energy spectrum of voiced speech (e.g., vowel sounds) tends to havelocal peaks at harmonics of the pitch frequency. The energy spectrum ofbackground noise, on the other hand, tends to be relativelyunstructured. Consequently, components of the input channels atharmonics of the pitch frequency may be expected to have a highersignal-to-noise ratio (SNR) than other components. It may be desirableto configure method M20 to consider only frequency components thatcorrespond to multiples of an estimated pitch frequency.

Typical pitch frequencies range from about 70 to 100 Hz for a malespeaker to about 150 to 200 Hz for a female speaker. The current pitchfrequency may be estimated by calculating the pitch period as thedistance between adjacent pitch peaks (e.g., in a primary microphonechannel). A sample of an input channel may be identified as a pitch peakbased on a measure of its energy (e.g., based on a ratio between sampleenergy and frame average energy) and/or a measure of how well aneighborhood of the sample is correlated with a similar neighborhood ofa known pitch peak. A pitch estimation procedure is described, forexample, in section 4.6.3 (pp. 4-44 to 4-49) of EVRC (Enhanced VariableRate Codec) document C.S0014-C, available online at www.3gpp.org. Acurrent estimate of the pitch frequency (e.g., in the form of anestimate of the pitch period or “pitch lag”) will typically already beavailable in applications that include speech encoding and/or decoding(e.g., voice communications using codecs that include pitch estimation,such as code-excited linear prediction (CELP) and prototype waveforminterpolation (PWI)).

It may be desirable, for example, to configure task T100 such that atleast twenty-five, fifty or seventy-five percent of the calculatedchannel differences (e.g., phase differences) correspond to multiples ofan estimated pitch frequency. The same principle may be applied to otherdesired harmonic signals as well. In a related method, task T100 isimplemented to calculate phase differences for each of the frequencycomponents of at least a subband of the channel pair, and task T200 isimplemented to calculate directional errors based on only those phasedifferences which correspond to multiples of an estimated pitchfrequency.

FIG. 34A shows a flowchart for an implementation M25 of method M20 thatincludes task T400. Such a method may be used, for example, to indicatea direction of arrival of a source signal, based on information from apair of channels of a multichannel signal. Based on the F candidatedirection selections produced by task T300, task T400 indicates adirection of arrival. For example, task T400 may be implemented toindicate the most frequently selected among the F candidate directionsas the direction of arrival. For a case in which the source signals aredisjoint in frequency, task T400 may be implemented to indicate morethan one direction of arrival (e.g., to indicate a direction for eachamong more than one source). Method M25 may be iterated over time toindicate one or more directions of arrival for each of a sequence offrames of the multichannel signal.

A microphone pair having a large spacing is typically not suitable forhigh frequencies, because spatial aliasing begins at a low frequency forsuch a pair. A DOA estimation approach as described herein, however,allows the use of phase delay measurements beyond the frequency at whichphase wrapping begins, and even up to the Nyquist frequency (i.e., halfof the sampling rate). By relaxing the spatial aliasing constraint, suchan approach enables the use of microphone pairs having largerinter-microphone spacing. As an array with a large inter-microphonedistance typically provides better directivity at low frequencies thanan array with a small inter-microphone distance, use of a larger arraytypically extends the range of useful phase delay measurements intolower frequencies as well.

The DOA estimation principles described herein may be extended tomultiple microphone pairs MC10 a, MC10 b, MC10 c in a linear array(e.g., as shown in FIG. 2B). One example of such an application for afar-field scenario is a linear array of microphones MC10 a-e arrangedalong the margin of a television TV10 or other large-format videodisplay screen (e.g., as shown in FIG. 4B). It may be desirable toconfigure such an array to have a non-uniform (e.g., logarithmic)spacing between microphones, as in the examples of FIGS. 2B and 4B.

For a far-field source, the multiple microphone pairs of a linear arraywill have essentially the same DOA. Accordingly, one option is toestimate the DOA as an average of the DOA estimates from two or morepairs in the array. However, an averaging scheme may be affected bymismatch of even a single one of the pairs, which may reduce DOAestimation accuracy. Alternatively, it may be desirable to select, fromamong two or more pairs of microphones of the array, the best microphonepair for each frequency (e.g., the pair that gives the minimum error e,at that frequency), such that different microphone pairs may be selectedfor different frequency bands. At the spatial aliasing frequency of amicrophone pair, the error will be large. Consequently, such an approachwill tend to automatically avoid a microphone pair when the frequency isclose to its wrapping frequency, thus avoiding the related uncertaintyin the DOA estimate. For higher-frequency bins, a pair having a shorterdistance between the microphones will typically provide a betterestimate and may be automatically favored, while for lower-frequencybins, a pair having a larger distance between the microphones willtypically provide a better estimate and may be automatically favored. Inthe four-microphone example shown in FIG. 2B, six different pairs ofmicrophones are possible (i.e.,

$ {\begin{pmatrix}4 \\2\end{pmatrix} = 6} ).$

In one example, the best pair for each axis is selected by calculating,for each frequency f, PxI values, where P is the number of pairs, I isthe size of the inventory, and each value e_(pi) is the squared absolutedifference between the observed angle θ_(pf) (for pair p and frequencyf) and the candidate angle θ_(if). For each frequency f, the pair p thatcorresponds to the lowest error value e_(pi) is selected. This errorvalue also indicates the best DOA candidate θ_(if) at frequency f (asshown in FIG. 6A).

FIG. 34B shows a flowchart for an implementation M30 of method M10 thatincludes an implementation T150 of task T10 and an implementation T250of task T20. Method M30 may be used, for example, to indicate acandidate direction for a frequency component of the multichannel signal(e.g., at a particular frame).

For each among a plurality P of pairs of channels of the multichannelsignal, task T250 calculates a plurality of directional errors. TaskT250 may be implemented to calculate K directional errors for eachchannel pair. For example, task T250 may be implemented to perform acorresponding instance of task T20 for each among the plurality P ofchannel pairs. Alternatively, task T250 may be implemented to calculateK directional errors for each among one or more of the channel pairs,and to calculate a different number (e.g., more or less than K)directional errors for each among a different one or more among thechannel pairs.

Method M30 also includes a task T35 that selects a candidate direction,based on the pluralities of directional errors. For example, task T35may be implemented to select the candidate direction that corresponds tothe lowest among the directional errors.

FIG. 34C shows a flowchart for an implementation M100 of method M30 thatincludes an implementation T170 of tasks T100 and T150, animplementation T270 of tasks T200 and T250, and an implementation T350of task T35. Method M100 may be used, for example, to select a candidatedirection for each among a plurality F of frequency components of themultichannel signal (e.g., at a particular frame).

For each among the plurality F of frequency components, task T170calculates a plurality P of differences, where each among the pluralityP of differences corresponds to a different pair of channels of themultichannel signal and is a difference between the 21 channels (e.g., again-based or phase-based difference). For each among the plurality F offrequency components, task T270 calculates a plurality of directionalerrors for each among the plurality P of pairs. For example, task T270may be implemented to calculate, for each of the frequency components, Kdirectional errors for each of the P pairs, or a total of P×Kdirectional errors for each frequency component. For each among theplurality F of frequency components, and based on the correspondingpluralities of directional errors, task T350 selects a correspondingcandidate direction.

FIG. 35A shows a flowchart for an implementation M110 of method M100.The implementation M110 may include tasks T170, T270, T350 and T400 thatmay be examples of corresponding elements described in connection withat least one of FIG. 34A and FIG. 34C.

FIG. 35B shows a block diagram of an apparatus A5 according to a generalconfiguration that includes an error calculator 200 and a selector 300.Error calculator 200 is configured to calculate, for a calculateddifference between a pair of channels of a multichannel signal and foreach among a plurality K of candidate directions, a correspondingdirectional error that is based on the calculated difference (e.g., asdescribed herein with reference to implementations of task T20).Selector 300 is configured to select a candidate direction, based on thecorresponding directional error (e.g., as described herein withreference to implementations of task T30).

FIG. 35C shows a block diagram of an implementation A10 of apparatus A5that includes a difference calculator 100. Apparatus A10 may beimplemented, for example, to perform an instance of method M10, M20,M30, and/or M100 as described herein. Calculator 100 is configured tocalculate a difference (e.g., a gain-based or phase-based difference)between a pair of channels of a multichannel signal (e.g., as describedherein with reference to implementations of task T10). Calculator 100may be implemented, for example, to calculate such a difference for eachamong a plurality F of frequency components of the multichannel signal.In such case, calculator 100 may also be implemented to apply a subbandfilter bank to the signal and/or to calculate a frequency transform ofeach channel (e.g., a fast Fourier transform (FFT) or modified discretecosine transform (MDCT)) before calculating the difference.

FIG. 35D shows a block diagram of an implementation A15 of apparatus A10that includes an indicator 400. Indicator 400 is configured to indicatea direction of arrival, based on a plurality of candidate directionselections produced by selector 300 (e.g., as described herein withreference to implementations of task T400). Apparatus A15 may beimplemented, for example, to perform an instance of method M25 and/orM110 as described herein.

FIG. 36A shows a block diagram of an apparatus MF5 according to ageneral configuration. Apparatus MF5 includes means F20 for calculating,for a calculated difference between a pair of channels of a multichannelsignal and for each among a plurality K of candidate directions, acorresponding directional error or fitness measure that is based on thecalculated difference (e.g., as described herein with reference toimplementations of task T20). Apparatus MF5 also includes means F30 forselecting a candidate direction, based on the corresponding directionalerror (e.g., as described herein with reference to implementations oftask T30).

FIG. 36B shows a block diagram of an implementation MF10 of apparatusMF5 that includes means F10 for calculating a difference (e.g., again-based or phase-based difference) between a pair of channels of amultichannel signal (e.g., as described herein with reference toimplementations of task T10). Means F10 may be implemented, for example,to calculate such a difference for each among a plurality F of frequencycomponents of the multichannel signal. In such case, means F10 may alsobe implemented to include means for performing a subband analysis and/orcalculating a frequency transform of each channel (e.g., a fast Fouriertransform (FFT) or modified discrete cosine transform (MDCT)) beforecalculating the difference. Apparatus MF10 may be implemented, forexample, to perform an instance of method M10, M20, M30, and/or M100 asdescribed herein.

FIG. 36C shows a block diagram of an implementation MF15 of apparatusMF10 that includes means F40 for indicating a direction of arrival,based on a plurality of candidate direction selections produced by meansF30 (e.g., as described herein with reference to implementations of taskT400). Apparatus MF15 may be implemented, for example, to perform aninstance of method M25 and/or M110 as described herein.

The signals received by a microphone pair may be processed as describedherein to provide an estimated DOA, over a range of up to 180 degrees,with respect to the axis of the microphone pair. The desired angularspan and resolution may be arbitrary within that range (e.g. uniform(linear) or non-uniform (nonlinear), limited to selected sectors ofinterest, etc.). Additionally or alternatively, the desired frequencyspan and resolution may be arbitrary (e.g. linear, logarithmic,mel-scale, Bark-scale, etc.).

In the model as shown in FIG. 2B, each DOA estimate between 0 and +/−90degrees from a microphone pair indicates an angle relative to a planethat is orthogonal to the axis of the pair. Such an estimate describes acone around the axis of the pair, and the actual direction of the sourcealong the surface of this cone is indeterminate. For example, a DOAestimate from a single microphone pair does not indicate whether thesource is in front of or behind (or above or below) the microphone pair.Therefore, while more than two microphones may be used in a linear arrayto improve DOA estimation performance across a range of frequencies, therange of DOA estimation supported by a linear array is typically limitedto 180 degrees.

The DOA estimation principles described herein may also be extended to atwo-dimensional (2-D) array of microphones. For example, a 2-D array maybe used to extend the range of source DOA estimation up to a full 360°(e.g., providing a similar range as in applications such as radar andbiomedical scanning). Such an array may be used in a speakerphoneapplication, for example, to support good performance even for arbitraryplacement of the telephone relative to one or more sources.

The multiple microphone pairs of a 2-D array typically will not sharethe same DOA, even for a far-field point source. For example, sourceheight relative to the plane of the array (e.g., in the z-axis) may playan important role in 2-D tracking. FIG. 10A shows an example of aspeakerphone application in which the x-y plane as defined by themicrophone axes is parallel to a surface (e.g., a tabletop) on which thetelephone is placed. In this example, the source 1001 is a personspeaking from a location that is along the x axis 1010 but is offset inthe direction of the z axis 1014 (e.g., the speaker's mouth is above thetabletop). With respect to the x-y plane as defined by the microphonearray, the direction of the source 1001 is along the x axis 1010, asshown in FIG. 10A. The microphone pair along the y axis 1012 estimates aDOA of the source as zero degrees from the x-z plane. Due to the heightof the speaker above the x-y plane, however, the microphone pair alongthe x axis estimates a DOA of the source as 30° from the x axis 1010(i.e., 60 degrees from the y-z plane), rather than along the x axis1010. FIGS. 11A and 11B show two views of the cone of confusion CY10associated with this DOA estimate, which causes an ambiguity in theestimated speaker direction with respect to the microphone axis. FIG.37A shows another example of a point source 3720 (i.e., a speaker'smouth) that is elevated above a plane of the device H100 (e.g., adisplay plane and/or a plane defined by microphone array axes).

An expression such as

$\lbrack {{\tan^{- 1}( \frac{\sin \; \theta_{1}}{\sin \; \theta_{2}} )},{\tan^{- 1}( \frac{\sin \; \theta_{2}}{\sin \; \theta_{1}} )}} \rbrack,$

where θ₁ and θ₂ are the estimated DOA for pair 1 and 2, respectively,may be used to project all pairs of DOAs to a 360° range in the plane inwhich the three microphones are located. Such projection may be used toenable tracking directions of active speakers over a 360° range aroundthe microphone array, regardless of height difference. Applying theexpression above to project the DOA estimates (0°, 60°) of FIG. 10A intothe x-y plane produces

${\lbrack {{\tan^{- 1}( \frac{\sin \; 0{^\circ}}{\sin \; 60{^\circ}} )},{\tan^{- 1}( \frac{\sin \; 60{^\circ}}{\sin \; 0{^\circ}} )}} \rbrack = ( {{0{^\circ}},{90{^\circ}}} )},$

which may be mapped to a combined directional estimate 1022 (e.g., anazimuth) of 270° as shown in FIG. 10B.

In a typical use case, the source will be located in a direction that isnot projected onto a microphone axis. FIGS. 12A-12D show such an examplein which the source S01 is located above the plane of the microphonesMC10, MC20, MC30. In this example, the DOA of the source signal passesthrough the point (x, y, z)=(5, 2, 5), FIG. 12A shows the x-y plane asviewed from the +z direction. FIGS. 12B and 12D show the x-z plane asviewed from the direction of microphone MC30, and FIG. 12C shows the y-zplane as viewed from the direction of microphone MC10. The shaded areain FIG. 12A indicates the cone of confusion CY associated with the DOAθ₁ as observed by the y-axis microphone pair MC20-MC30, and the shadedarea in FIG. 12B indicates the cone of confusion CX associated with theDOA S01, θ₂ as observed by the x-axis microphone pair MC10-MC20. In FIG.12C, the shaded area indicates cone CY, and the dashed circle indicatesthe intersection of cone CX with a plane that passes through the sourceand is orthogonal to the x axis. The two dots on this circle thatindicate its intersection with cone CY are the candidate locations ofthe source. Likewise, in FIG. 12D the shaded area indicates cone CX, thedashed circle indicates the intersection of cone CY with a plane thatpasses through the source and is orthogonal to the y axis, and the twodots on this circle that indicate its intersection with cone CX are thecandidate locations of the source. It may be seen that in this 2-D case,an ambiguity remains with respect to whether the source is above orbelow the x-y plane.

For the example shown in FIGS. 12A-12D, the DOA observed by the x-axismicrophone pair MC10-MC20 is θ₂=tan⁻¹(−5/√{square root over(25+4)})≈˜42.9°, and the DOA observed by the y-axis microphone pairMC20-MC30 is θ₁=tan⁻¹(−2/√{square root over (25+25)})≈−15.89°. Using theexpression

$\lbrack {{\tan^{- 1}( \frac{\sin \; \theta_{1}}{\sin \; \theta_{2}} )},{\tan^{- 1}( \frac{\sin \; \theta_{2}}{\sin \; \theta_{1}} )}} \rbrack$

to project these directions into the x-y plane produces the magnitudes(21.8°, 68.2°) of the desired angles relative to the x and y axes,respectively, which corresponds to the given source location (x, y,z)=(5, 2, 5). The signs of the observed angles indicate the x-y quadrantin which the source (e.g., as indicated by the microphones MC10, MC20and MC30) is located, as shown in FIG. 11C.

In fact, almost 3D information is given by a 2D microphone array, exceptfor the up-down confusion. For example, the directions of arrivalobserved by microphone pairs MC10-MC20 and MC20-MC30 may also be used toestimate the magnitude of the angle of elevation of the source relativeto the x-y plane. If d denotes the vector from microphone MC20 to thesource, then the lengths of the projections of vector d onto the x-axis,the y-axis, and the x-y plane may be expressed as d sin(θ₂), d sin(θ₁)and d√{square root over (sin²(θ₁)+sin²(θ₂))}{square root over(sin²(θ₁)+sin²(θ₂))}, respectively. The magnitude of the angle ofelevation may then be estimated as {circumflex over(θ)}_(h)=cos⁻¹√{square root over (sin²(θ₁)+sin²(θ₂))}{square root over(sin²(θ₁)+sin²(θ₂))}.

Although the microphone pairs in the particular examples of FIGS.10A-10B and 12A-12D have orthogonal axes, it is noted that formicrophone pairs having non-orthogonal axes, the expression

$\lbrack {{\tan^{- 1}( \frac{\sin \; \theta_{1}}{\sin \; \theta_{2}} )},{\tan^{- 1}( \frac{\sin \; \theta_{2}}{\sin \; \theta_{1}} )}} \rbrack$

may be used to project the DOA estimates to those non-orthogonal axes,and from that point it is straightforward to obtain a representation ofthe combined directional estimate with respect to orthogonal axes. FIG.37B shows an example of the intersecting cones of confusion C1, C2associated with the responses of microphone arrays having non-orthogonalaxes (as shown) to a common point source. FIG. 37C shows one of thelines of intersection L1 of these cones C1, C2, which defines one of twopossible directions of the point source with respect to the array axesin three dimensions.

FIG. 13A shows an example of microphone array MC10, MC20, MC30 in whichthe axis 1 of pair MC20, MC30 lies in the x-y plane and is skewedrelative to the y axis by a skew angle θ₀. FIG. 13B shows an example ofobtaining a combined directional estimate in the x-y plane with respectto orthogonal axes x and y with observations (θ₁, θ₂) from an array ofmicrophones MC10, MC20, MC30 as shown in FIG. 13A. If d denotes thevector from microphone MC20 to the source, then the lengths of theprojections of vector d onto the x-axis and axis 1 may be expressed as dsin(θ₂), d sin(θ₁), respectively. The vector (x, y) denotes theprojection of vector d onto the x-y plane. The estimated value of x isknown, and it remains to estimate the value of y.

The estimation of y may be performed using the projection p₁=(d sin θ₁sin θ₀, d sin θ₁ cos θ₀) of vector (x, y) onto axis 1. Observing thatthe difference between vector (x, y) and vector p₁ is orthogonal to p₁,we calculate y as

$y = \; {d\; {\frac{{\sin \; \theta_{1}} - {\sin \; \theta_{2}\sin \; \theta_{0}}}{\cos \; \theta_{0}}.}}$

The desired angles of arrival in the x-y plane, relative to theorthogonal x and y axes, may then be expressed respectively as

$( {{\tan^{- 1}( \frac{y}{x} )},{\tan^{- 1}( \frac{x}{y} )}} ) = {\begin{pmatrix}{{\tan^{- 1}( \frac{{\sin \; \theta_{1}} - {\sin \; \theta_{2}\sin \; \theta_{0}}}{\sin \; \theta_{2}\cos \; \theta_{2}} )},} \\{\tan^{- 1}( \frac{\sin \; \theta_{2}\cos \; \theta_{0}}{{\sin \; \theta_{1}} - {\sin \; \theta_{2}\sin \; \theta_{0}}} )}\end{pmatrix}.}$

Extension of DOA estimation to a 2-D array is typically well-suited toand sufficient for a speakerphone application. However, furtherextension to an N-dimensional array is also possible and may beperformed in a straightforward manner. For tracking applications inwhich one target is dominant, it may be desirable to select N pairs forrepresenting N dimensions. Once a 2-D result is obtained with aparticular microphone pair, another available pair can be utilized toincrease degrees of freedom. For example, FIGS. 12A-12D and 13A, 13Billustrate use of observed DOA estimates from different microphone pairsin the x-y plane to obtain an estimate of the source direction asprojected into the x-y plane. In the same manner, observed DOA estimatesfrom an x-axis microphone pair and a z-axis microphone pair (or otherpairs in the x-z plane) may be used to obtain an estimate of the sourcedirection as projected into the x-z plane, and likewise for the y-zplane or any other plane that intersects three or more of themicrophones.

Estimates of DOA error from different dimensions may be used to obtain acombined likelihood estimate, for example, using an expression such as

${\frac{1}{{\max ( {{{\theta - \theta_{0,1}}}_{f,1}^{2},{{\theta - \theta_{0,2}}}_{f,2}^{2}} )} + \lambda}\mspace{14mu} {or}\mspace{14mu} \frac{1}{{{mean}( {{{\theta - \theta_{0,1}}}^{2},{{\theta - \theta_{0,2}}}_{f,2}^{2}} )} + \lambda}1},$

where θ_(0,i) denotes the DOA candidate selected for pair i. Use of themaximum among the different errors may be desirable to promote selectionof an estimate that is close to the cones of confusion of bothobservations, in preference to an estimate that is close to only one ofthe cones of confusion and may thus indicate a false peak. Such acombined result may be used to obtain a (frame, angle) plane, as shownin FIG. 8 and described herein, and/or a (frame, frequency) plot, asshown at the bottom of FIG. 9 and described herein.

The DOA estimation principles described herein may be used to supportselection among multiple speakers. For example, location of multiplesources may be combined with a manual selection of a particular speaker(e.g., push a particular button to select a particular correspondinguser) or automatic selection of a particular speaker (e.g., by speakerrecognition). In one such application, a telephone is configured torecognize the voice of its owner and to automatically select a directioncorresponding to that voice in preference to the directions of othersources.

For a one-dimensional (1-D) array of microphones, a direction of arrivalDOA10 for a source may be easily defined in a range of, for example,−90° to 90°. For example, it is easy to obtain a closed-form solutionfor the direction of arrival DOA10 across a range of angles (e.g., asshown in cases 1 and 2 of FIG. 13C) in terms of phase differences amongthe signals produced by the various microphones of the array.

For an array that includes more than two microphones at arbitraryrelative locations (e.g., a non-coaxial array), it may be desirable touse a straightforward extension of one-dimensional principles asdescribed above, e.g. (θ1, θ2) in a two-pair case in two dimensions,(θ1, θ2, θ3) in a three-pair case in three dimensions, etc. A keyproblem is how to apply spatial filtering to such a combination ofpaired 1-D direction of arrival DOA10 estimates. For example, it may bedifficult or impractical to obtain a closed-form solution for thedirection of arrival DOA10 across a range of angles for a non-coaxialarray (e.g., as shown in cases 3 and 4 of FIG. 13C) in terms of phasedifferences among the signals produced by the various microphones of thearray.

FIG. 14A shows an example of a straightforward one-dimensional (1-D)pairwise beamforming-nullforming (BFNF) BF10 configuration for spatiallyselective filtering that is based on robust 1-D DOA estimation. In thisexample, the notation d_(i,j) ^(k) denotes microphone pair number i,microphone number j within the pair, and source number k, such that eachpair [d_(i,1) ^(k)d_(i,2) ^(k)]^(T) represents a steering vector for therespective source and microphone pair (the ellipse indicates thesteering vector for source 1 and microphone pair 1), and λ denotes aregularization factor. The number of sources is not greater than thenumber of microphone pairs. Such a configuration avoids a need to useall of the microphones at once to define a DOA.

We may apply a beamformer/null beamformer (BFNF) BF10 as shown in FIG.14A by augmenting the steering vector for each pair. In this figure,A^(H) denotes the conjugate transpose of A, x denotes the microphonechannels and y denotes the spatially filtered channels. Using apseudo-inverse operation A⁺=(A^(H)A)⁻¹A^(H) as shown in FIG. 14A allowsthe use of a non-square matrix. For a three-microphone MC10, MC20, MC30case (i.e., two microphone pairs) as illustrated in FIG. 15A, forexample, the number of rows 2*2=4 instead of 3, such that the additionalrow makes the matrix non-square.

As the approach shown in FIG. 14A is based on robust 1-D DOA estimation,complete knowledge of the microphone geometry is not required, and DOAestimation using all microphones at the same time is also not required.Such an approach is well-suited for use with anglogram-based DOAestimation as described herein, although any other 1-D DOA estimationmethod can also be used. FIG. 14B shows an example of the BFNF BF10 asshown in FIG. 14A which also includes a normalization N10 (i.e., by thedenominator) to prevent an ill-conditioned inversion at the spatialaliasing frequency (i.e., the wavelength that is twice the distancebetween the microphones).

FIG. 15B shows an example of a pair-wise (PW) normalized MVDR (minimumvariance distortionless response) BFNF BF10, in which the manner inwhich the steering vector (array manifold vector) is obtained differsfrom the conventional approach. In this case, a common channel iseliminated due to sharing of a microphone between the two pairs (e.g.,the microphone labeled as x_(1,2) and x_(2,1) in FIG. 15A). The noisecoherence matrix Γ may be obtained either by measurement or bytheoretical calculation using a sinc function. It is noted that theexamples of FIGS. 14A, 14B, and 15B may be generalized to an arbitrarynumber of sources N such that N<=M, where M is the number ofmicrophones.

FIG. 16A shows another example of a BFNF BF10 that may be used if thematrix A^(H)A is not ill-conditioned, which may be determined using acondition number or determinant of the matrix. In this example, thenotation is as in FIG. 14A, and the number of sources N is not greaterthan the number of microphone pairs M. If the matrix is ill-conditioned,it may be desirable to bypass one microphone signal for that frequencybin for use as the source channel, while continuing to apply the methodto spatially filter other frequency bins in which the matrix A^(H)A isnot ill-conditioned. This option saves computation for calculating adenominator for normalization. The methods in FIGS. 14A-16A demonstrateBFNF BF10 techniques that may be applied independently at each frequencybin. The steering vectors are constructed using the DOA estimates foreach frequency and microphone pair as described herein. For example,each element of the steering vector for pair p and source n for DOAθ_(i), frequency f, and microphone number m (1 or 2) may be calculatedas

${d_{p,m}^{n} = {\exp ( {\frac{{- j}\; \omega \; {f_{s}( {m - 1} )}l_{p}}{c}\cos \; \theta_{i}} )}},$

where l_(p) indicates the distance between the microphones of pair p, ωindicates the frequency bin number, and f_(s) indicates the samplingfrequency. FIG. 16B shows examples of steering vectors SV10 a-b for anarray as shown in FIG. 15A.

A PWBFNF scheme may be used for suppressing direct path of interferersup to the available degrees of freedom (instantaneous suppressionwithout smooth trajectory assumption, additional noise-suppression gainusing directional masking, additional noise-suppression gain usingbandwidth extension). Single-channel post-processing of quadrantframework may be used for stationary noise and noise-reference handling.

It may be desirable to obtain instantaneous suppression but also toprovide minimization of artifacts such as musical noise. It may bedesirable to maximally use the available degrees of freedom for BFNF.One DOA may be fixed across all frequencies, or a slightly mismatchedalignment across frequencies may be permitted. Only the current framemay be used, or a feed-forward network may be implemented. The BFNF maybe set for all frequencies in the range up to the Nyquist rate (e.g.,except ill-conditioned frequencies). A natural masking approach may beused (e.g., to obtain a smooth natural seamless transition ofaggressiveness). FIG. 31 shows an example of DOA tracking for a targetand a moving interferer for a scenario as shown in FIGS. 21B and 22. InFIG. 31 a fixed source S10 at D is indicated, and a moving source S20 isalso indicated.

FIG. 17 shows a flowchart for one example of an integrated method 1700as described herein. This method includes an inventory matching task T10for phase delay estimation, an error calculation task T20 to obtain DOAerror values, a dimension-matching and/or pair-selection task T30, and atask T40 to map DOA error for the selected DOA candidate to a sourceactivity likelihood estimate. The pair-wise DOA estimation results mayalso be used to track one or more active speakers, to perform apair-wise spatial filtering operation, and/or to perform time- and/orfrequency-selective masking. The activity likelihood estimation and/orspatial filtering operation may also be used to obtain a noise estimateto support a single-channel noise suppression operation. FIGS. 18 and 19show an example of observations obtained using a 2-D microphonearrangement to track movement of a source (e.g., a human speaker) amongdirections A-B-C-D as shown in FIG. 21A. As depicted in FIG. 21A threemicrophones MC10, MC20, MC30 may be used to record an audio signal. Inthis example, FIG. 18 shows observations A-D by the y-axis pairMC20-MC30, where distance dx is 3.6 centimeters; FIG. 19 showsobservations A-D by the x-axis pair MC10-MC20, where distance dy is 7.3centimeters; and the inventory of DOA estimates covers the range of −90degrees to +90 degrees at a resolution of five degrees.

It may be understood that when the source is in an endfire direction ofa microphone pair, elevation of a source above or below the plane of themicrophones limits the observed angle. Consequently, when the source isoutside the plane of the microphones, it is typical that no real endfireis observed. It may be seen in FIGS. 18 and 19 that due to elevation ofthe source with respect to the microphone plane, the observed directionsdo not reach −90 degrees even as the source passes through thecorresponding endfire direction (i.e., direction A for the x-axis pairMC10-MC20, and direction B for the y-axis pair MC20-MC30).

FIG. 20 shows an example in which +/−90-degree observations A-D fromorthogonal axes, as shown in FIGS. 18 and 19 for a scenario as shown inFIG. 21A, are combined to produce DOA estimates in the microphone planeover a range of zero to 360 degrees. In this example, a one-degreeresolution is used. FIG. 22 shows an example of combined observationsA-D using a 2-D microphone arrangement, where distance dx is 3.6centimeters and distance dy is 7.3 centimeters, to track movement, bymicrophones MC10, MC20, MC30 of a source (e.g., a human speaker) amongdirections A-B-C as shown in FIG. 21B in the presence of another source(e.g., a stationary human speaker) at direction D.

As described above, a DOA estimate may be calculated based on a sum oflikelihoods. When combining observations from different microphone axes(e.g., as shown in FIG. 20), it may be desirable to perform thecombination for each individual frequency bin before calculating a sumof likelihoods, especially if more than one directional source may bepresent (e.g., two speakers, or a speaker and an interferer). Assumingthat no more than one of the sources is dominant at each frequency bin,calculating a combined observation for each frequency componentpreserves the distinction between dominance of different sources atdifferent corresponding frequencies. If a summation over frequency binsdominated by different sources is performed on the observations beforethey are combined, then this distinction may be lost, and the combinedobservations may indicate spurious peaks at directions which do notcorrespond to the location of any actual source. For example, summingobservations from orthogonal microphone pairs of a first source at 45degrees and a second source at 225 degrees, and then combining thesummed observations, may produce spurious peaks at 135 and 315 degreesin addition to the desired peaks at 45 and 225 degrees.

FIGS. 23 and 24 show an example of combined observations for aconference call scenario, as shown in FIG. 25, in which the phone isstationary on a table top. In FIG. 25 a device may include threemicrophones MC10, MC20, MC30. In FIG. 23, the frame number 2310, anangle of arrival 2312 and an amplitude 2314 of a signal are illustrated.At about frame 5500, speaker 1 stands up, and movement of speaker 1 isevident to about frame 9000. Movement of speaker 3 near frame 9500 isalso visible. The rectangle in FIG. 24 indicates a target sectorselection TSS10, such that frequency components arriving from directionsoutside this sector may be rejected or otherwise attenuated, orotherwise processed differently from frequency components arriving fromdirections within the selected sector. In this example, the targetsector is the quadrant of 180-270 degrees and is selected by the userfrom among the four quadrants of the microphone plane. This example alsoincludes acoustic interference from an air conditioning system.

FIGS. 26 and 27 show an example of combined observations for a dynamicscenario, as shown in FIG. 28A. In FIG. 28A a device may be positionedbetween a first speaker S10, a second speaker S20 and a third speakerS30. In FIG. 26, the frame number 2610, an angle of arrival 2612 and anamplitude 2614 of a signal are illustrated. In this scenario, speaker 1picks up the phone at about frame 800 and replaces it on the table topat about frame 2200. Although the angle span is broader when the phoneis in this browse-talk position, it may be seen that the spatialresponse is still centered in a designated DOA. Movement of speaker 2after about frame 400 is also evident. As in FIG. 24, the rectangle inFIG. 27 indicates user selection of the quadrant of 180-270 degrees asthe target sector TSS10. FIGS. 29 and 30 show an example of combinedobservations for a dynamic scenario with road noise, as shown in FIG.28B. In FIG. 28B a phone may receive an audio signal from a speaker S10.In FIG. 29, the frame number 2910, an angle of arrival 2912 and anamplitude 2914 of a signal are illustrated. In this scenario, thespeaker picks up the phone between about frames 200 and 100 and againbetween about frames 1400 and 2100. In this example, the rectangle inFIG. 30 indicates user selection of the quadrant of 270-360 degrees asan interference sector IS10.

(VAD) An anglogram-based technique as described herein may be used tosupport voice activity detection (VAD), which may be applied for noisesuppression in various use cases (e.g., a speakerphone). Such atechnique, which may be implemented as a sector-based approach, mayinclude a “vadall” statistic based on a maximum likelihood(likelihood_max) of all sectors. For example, if the maximum issignificantly larger than a noise-only threshold, then the value of thevadall statistic is one (otherwise zero). It may be desirable to updatethe noise-only threshold only during a noise-only period. Such a periodmay be indicated, for example, by a single-channel VAD (e.g., from aprimary microphone channel) and/or a VAD based on detection of speechonsets and/or offsets (e.g., based on a time-derivative of energy foreach of a set of frequency components).

Additionally or alternatively, such a technique may include a per-sector“vad[sector]” statistic based on a maximum likelihood of each sector.Such a statistic may be implemented to have a value of one only when thesingle-channel VAD and the onset-offset VAD are one, vadall is one andthe maximum for the sector is greater than some portion (e.g., 95%) oflikelihood_max. This information can be used to select a sector withmaximum likelihood. Applicable scenarios include a user-selected targetsector with a moving interferer, and a user-selected interference sectorwith a moving target.

It may be desirable to select a tradeoff between instantaneous tracking(PWBFNF performance) and prevention of too-frequent switching of theinterference sector. For example, it may be desirable to combine thevadall statistic with one or more other VAD statistics. The vad[sector]may be used to specify the interference sector and/or to triggerupdating of a non-stationary noise reference. It may also be desirableto normalize the vadall statistic and/or a vad[sector] statistic using,for example, a minimum-statistics-based normalization technique (e.g.,as described in U.S. Pat. Appl. Publ. No. 2012/0130713, published May24, 2012).

An anglogram-based technique as described herein may be used to supportdirectional masking, which may be applied for noise suppression invarious use cases (e.g., a speakerphone). Such a technique may be usedto obtain additional noise-suppression gain by using the DOA estimatesto control a directional masking technique (e.g., to pass a targetquadrant and/or to block an interference quadrant). Such a method may beuseful for handling reverberation and may produce an additional 6-12 dBof gain. An interface from the anglogram may be provided for quadrantmasking (e.g., by assigning an angle with maximum likelihood per eachfrequency bin). It may be desirable to control the maskingaggressiveness based on target dominancy, as indicated by the anglogram.Such a technique may be designed to obtain a natural masking response(e.g., a smooth natural seamless transition of aggressiveness).

It may be desirable to provide a multi-view graphical user interface(GUI) for source tracking and/or for extension of PW BFNF withdirectional masking. Various examples are presented herein ofthree-microphone (two-pair) two-dimensional (e.g., 360°) source trackingand enhancement schemes which may be applied to a desktop hands-freespeakerphone use case. However, it may be desirable to practice auniversal method to provide seamless coverage of use cases ranging fromthe desktop hands-free to handheld hands-free or even to handset usecases. While a three-microphone scheme may be used for a handheldhands-free use case, it may be desirable to also use a fourth microphone(if already there) on the back of the device. For example, it may bedesirable for at least four microphones (three microphone pairs) to beavailable to represent (x, y, z) dimension. A design as shown in FIG. 1has this feature, as does the design shown in FIG. 32A, with threefrontal microphones MC10, MC20, MC30 and a back microphone MC40 (shadedcircle).

It may be desirable to provide a visualization of an active source on adisplay screen of such a device. The extension principles describedherein may be applied to obtain a straightforward extension from 2D to3D by using a front-back microphone pair. To support a multi-view GUI,we can determine the user's holding pattern by utilizing any of avariety of position detection methods, such as an accelerometer,gyrometer, proximity sensor and/or a variance of likelihood given by 2Danglogram per each holding pattern. Depending on the current holdingpattern, we can switch to two non-coaxial microphone pairs asappropriate to such a holding pattern and can also provide acorresponding 360° 2D representation on the display, if the user wantsto see it.

For example, such a method may be implemented to support switching amonga range of modes that may include a desktop hands-free (e.g.,speakerphone) mode, a portrait browse-talk mode, and a landscapebrowse-talk mode. FIG. 32B shows an example of a desktop hands-free modewith three frontal microphones MC10, MC20, MC30 and a correspondingvisualization on a display screen of the device. FIG. 32D shows anexample of a handheld hands-free (portrait) mode, with two frontalmicrophones MC10, MC20, and one back microphone MC40 (shaded circle)being activated, and a corresponding display. FIG. 32C shows an exampleof a handheld hands-free (landscape) mode, with a different pair offrontal microphones MC10, MC20 and one back microphone MC40 (shadedcircle) being activated, and a corresponding display. In someconfigurations, the back microphone MC40 may be located on the back ofthe device, approximately behind one of the frontal microphones MC10.

It may be desirable to provide an enhancement of a target source. Theextension principles described herein may be applied to obtain astraightforward extension from 2D to 3D by also using a front-backmicrophone pair. Instead of only two DOA estimates (θ1, θ2), we obtainan additional estimate from another dimension for a total of three DOAestimates (θ1, θ2, θ3). In this case, the PWBFNF coefficient matrix asshown in FIGS. 14A and 14B expands from 4 by 2 to 6 by 2 (with the addedmicrophone pair), and the masking gain function expands from f(θ1)f(θ2)to f(θ1)f(θ2)f(θ3). Using a position-sensitive selection as describedabove, we can use all three microphone pairs optimally, regardless ofthe current holding pattern, to obtain a seamless transition among themodes in terms of the source enhancement performance. Of course, morethan three pairs may be used at one time as well.

Each of the microphones for direction estimation as discussed herein(e.g., with reference to location and tracking of one or more users orother sources) may have a response that is omnidirectional,bidirectional, or unidirectional (e.g., cardioid). The various types ofmicrophones that may be used include (without limitation) piezoelectricmicrophones, dynamic microphones, and electret microphones. It isexpressly noted that the microphones may be implemented more generallyas transducers sensitive to radiations or emissions other than sound. Inone such example, the microphone array is implemented to include one ormore ultrasonic transducers (e.g., transducers sensitive to acousticfrequencies greater than fifteen, twenty, twenty-five, thirty, forty orfifty kilohertz or more).

An apparatus as disclosed herein may be implemented as a combination ofhardware (e.g., a processor) with software and/or with firmware. Suchapparatus may also include an audio preprocessing stage AP10 as shown inFIG. 38A that performs one or more preprocessing operations on signalsproduced by each of the microphones MC10 and MC20 (e.g., of animplementation of one or more microphone arrays) to produce preprocessedmicrophone signals (e.g., a corresponding one of a left microphonesignal and a right microphone signal) for input to task T10 ordifference calculator 100. Such preprocessing operations may include(without limitation) impedance matching, analog-to-digital conversion,gain control, and/or filtering in the analog and/or digital domains.

FIG. 38B shows a block diagram of a three-channel implementation AP20 ofaudio preprocessing stage AP10 that includes analog preprocessing stagesP10 a, P10 b and P10 c. In one example, stages P10 a, P10 b, and P10 care each configured to perform a high-pass filtering operation (e.g.,with a cutoff frequency of 50, 100, or 200 Hz) on the correspondingmicrophone signal. Typically, stages P10 a, P10 b and P10 c will beconfigured to perform the same functions on each signal.

It may be desirable for audio preprocessing stage AP10 to produce eachmicrophone signal as a digital signal, that is to say, as a sequence ofsamples. Audio preprocessing stage AP20, for example, includesanalog-to-digital converters (ADCs) C10 a, C10 b and C10 c that are eacharranged to sample the corresponding analog signal. Typical samplingrates for acoustic applications include 8 kHz, 12 kHz, 16 kHz, and otherfrequencies in the range of from about 8 to about 16 kHz, althoughsampling rates as high as about 44.1, 48 or 192 kHz may also be used.Typically, converters C10 a, C10 b and C10 c will be configured tosample each signal at the same rate.

In this example, audio preprocessing stage AP20 also includes digitalpreprocessing stages P20 a, P20 b, and P20 c that are each configured toperform one or more preprocessing operations (e.g., spectral shaping) onthe corresponding digitized channel to produce a corresponding one of aleft microphone signal AL10, a center microphone signal AC10, and aright microphone signal AR10 for input to task T10 or differencecalculator 100. Typically, stages P20 a, P20 b and P20 c will beconfigured to perform the same functions on each signal. It is alsonoted that preprocessing stage AP10 may be configured to produce adifferent version of a signal from at least one of the microphones(e.g., at a different sampling rate and/or with different spectralshaping) for content use, such as to provide a near-end speech signal ina voice communication (e.g., a telephone call). Although FIGS. 38A and38B show two channel and three-channel implementations, respectively, itwill be understood that the same principles may be extended to anarbitrary number of microphones.

FIG. 39A shows a block diagram of an implementation MF15 of apparatusMF10 that includes means F40 for indicating a direction of arrival,based on a plurality of candidate direction selections produced by meansF30 (e.g., as described herein with reference to implementations of taskT400). Apparatus MF15 may be implemented, for example, to perform aninstance of method M25 and/or M110 as described herein.

The signals received by a microphone pair or other linear array ofmicrophones may be processed as described herein to provide an estimatedDOA that indicates an angle with reference to the axis of the array. Asdescribed above (e.g., with reference to methods M20, M25, M100, andM110), more than two microphones may be used in a linear array toimprove DOA estimation performance across a range of frequencies. Evenin such cases, however, the range of DOA estimation supported by alinear (i.e., one-dimensional) array is typically limited to 180degrees.

FIG. 2B shows a measurement model in which a one-dimensional DOAestimate indicates an angle (in the 180-degree range of +90 degrees to−90 degrees) relative to a plane that is orthogonal to the axis of thearray. Although implementations of methods M200 and M300 and task TB200are described below with reference to a context as shown in FIG. 2B, itwill be recognized that such implementations are not limited to thiscontext and that corresponding implementations with reference to othercontexts (e.g., in which the DOA estimate indicates an angle of 0 to 180degrees relative to the axis in the direction of microphone MC10 or,alternatively, in the direction away from microphone MC10) are expresslycontemplated and hereby disclosed.

The desired angular span may be arbitrary within the 180-degree range.For example, the DOA estimates may be limited to selected sectors ofinterest within that range. The desired angular resolution may also bearbitrary (e.g. uniformly distributed over the range, or nonuniformlydistributed). Additionally or alternatively, the desired frequency spanmay be arbitrary (e.g., limited to a voice range) and/or the desiredfrequency resolution may be arbitrary (e.g. linear, logarithmic,mel-scale, Bark-scale, etc.).

FIG. 39B shows an example of an ambiguity that results from theone-dimensionality of a DOA estimate from a linear array. In thisexample, a DOA estimate from microphone pair MC10, MC20 (e.g., as acandidate direction as produced by selector 300, or a DOA estimate asproduced by indicator 400) indicates an angle θ with reference to thearray axis. Even if this estimate is very accurate, however, it does notindicate whether the source is located along line d1 or along line d2.

As a consequence of its one-dimensionality, a DOA estimate from a linearmicrophone array actually describes a right circular conical surfacearound the array axis in space (assuming that the responses of themicrophones are perfectly omnidirectional) rather than any particulardirection in space. The actual location of the source on this conicalsurface (also called a “cone of confusion”) is indeterminate. FIG. 39Cshows one example of such a surface.

FIG. 40 shows an example of source confusion in a speakerphoneapplication in which three sources (e.g., mouths of human speakers) arelocated in different respective directions relative to device D100(e.g., a smartphone) having a linear microphone array. In this example,the source directions d1, d2, and d3 all happen to lie on a cone ofconfusion that is defined at microphone MC20 by an angle (θ+90 degrees)relative to the array axis in the direction of microphone MC10. Becauseall three source directions have the same angle relative to the arrayaxis, the microphone pair produces the same DOA estimate for each sourceand fails to distinguish among them.

To provide for an estimate having a higher dimensionality, it may bedesirable to extend the DOA estimation principles described herein to atwo-dimensional (2-D) array of microphones. FIG. 41A shows a 2-Dmicrophone array that includes two microphone pairs having orthogonalaxes. In this example, the axis of the first pair MC10, MC20 is the xaxis and the axis of the second pair MC20, MC30 is the y axis. Aninstance of an implementation of method M10 may be performed for thefirst pair to produce a corresponding 1-D DOA estimate θ_(x), and aninstance of an implementation of method M10 may be performed for thesecond pair to produce a corresponding 1-D DOA estimate θ_(y). For asignal that arrives from a source located in the plane defined by themicrophone axes, the cones of confusion described by θ_(x) and θ_(y)coincide at the direction of arrival d of the signal to indicate aunique direction in the plane.

FIG. 41B shows a flowchart of a method M200 according to a generalconfiguration that includes tasks TB100 a, TB100 b, and TB200. TaskTB100 a calculates a first DOA estimate for a multichannel signal withrespect to an axis of a first linear array of microphones, and taskTB100 a calculates a second DOA estimate for the multichannel signalwith respect to an axis of a second linear array of microphones. Each oftasks TB100 a and TB100 b may be implemented, for example, as aninstance of an implementation of method M10 (e.g., method M20, M30,M100, or M110) as described herein. Based on the first and second DOAestimates, task TB200 calculates a combined DOA estimate.

The range of the combined DOA estimate may be greater than the range ofeither of the first and second DOA estimates. For example, task TB200may be implemented to combine 1-D DOA estimates, produced by tasks TB100a and TB100 b and having individual ranges of up to 180 degrees, toproduce a combined DOA estimate that indicates the DOA as an angle in arange of up to 360 degrees. Task TB200 may be implemented to map 1-D DOAestimates θ_(x), θ_(y) to a direction in a larger angular range byapplying a mapping, such as

$\begin{matrix}{\theta_{c} = \{ \begin{matrix}{\theta_{y},} & {\theta_{x} > 0} \\{{{180{^\circ}} - \theta_{y}},} & {{otherwise},}\end{matrix} } & (1)\end{matrix}$

to combine one angle with information (e.g., sign information) from theother angle. For the 1-D estimates (θ_(x), θ_(y))=(45°, 45°) as shown inFIG. 41A, for example, TB200 may be implemented to apply such a mappingto obtain a combined estimate θ_(c) of 45 degrees relative to thex-axis. For a case in which the range of the DOA estimates is 0 to 180degrees rather than −90 to +90 degrees, it will be understood that theaxial polarity (i.e., positive or negative) condition in expression (1)would be expressed in terms of whether the DOA estimate under test isless than or greater than 90 degrees.

It may be desirable to show the combined DOA estimate θ_(c) on a360-degree-range display. For example, it may be desirable to displaythe DOA estimate as an angle on a planar polar plot. Planar polar plotdisplay is familiar in applications such as radar and biomedicalscanning, for example. FIG. 41C shows an example of a DOA estimate shownon such a display. In this example, the direction of the line indicatesthe DOA estimate and the length of the line indicates the currentstrength of the component arriving from that direction. As shown in thisexample, the polar plot may also include one or more concentric circlesto indicate intensity of the directional component on a linear orlogarithmic (e.g., decibel) scale. For a case in which more than one DOAestimate is available at one time (e.g., for sources that are disjointin frequency), a corresponding line for each DOA estimate may bedisplayed. Alternatively, the DOA estimate may be displayed on arectangular coordinate system (e.g., Cartesian coordinates).

FIGS. 42A and 42B show correspondences between the signs of the 1-Destimates θ_(x) and θ_(y), respectively, and corresponding quadrants ofthe plane defined by the array axes. FIG. 42C shows a correspondencebetween the four values of the tuple (sign(θ_(x)), sign(θ_(y))) and thequadrants of the plane. FIG. 42D shows a 360-degree display according toan alternate mapping (e.g., relative to the y-axis)

$\begin{matrix}{\theta_{c} = \{ \begin{matrix}{{- \theta_{x}},} & {\theta_{y} > 0} \\{{\theta_{x} + 180},} & {{otherwise}.}\end{matrix} } & (2)\end{matrix}$

It is noted that FIG. 41A illustrates a special case in which the sourceis located in the plane defined by the microphone axes, such that thecones of confusion described by θ_(x) and θ_(y) indicate a uniquedirection in this plane. For most practical applications, it may beexpected that the cones of confusion of nonlinear microphone pairs of a2-D array typically will not coincide in a plane defined by the array,even for a far-field point source. For example, source height relativeto the plane of the array (e.g., displacement of the source along thez-axis) may play an important role in 2-D tracking.

It may be desirable to produce an accurate 2-D representation ofdirections of arrival for signals that are received from sources atarbitrary locations in a three-dimensional space. For example, it may bedesirable for the combined DOA estimate produced by task TB200 toindicate the DOA of a source signal in a plane that does not include theDOA (e.g., a plane defined by the microphone array or by a displaysurface of the device). Such indication may be used, for example, tosupport arbitrary placement of the audio sensing device relative to thesource and/or arbitrary relative movement of the device and source(e.g., for speakerphone and/or source tracking applications).

FIG. 43A shows an example that is similar to FIG. 41A but depicts a moregeneral case in which the source is located above the x-y plane. In suchcase, the intersection of the cones of confusion of the arrays indicatestwo possible directions of arrival: a direction d1 that extends abovethe x-y plane, and a direction d2 that extends below the x-y plane. Inmany applications, this ambiguity may be resolved by assuming thatdirection d1 is correct and ignoring the second direction d2. For aspeakerphone application in which the device is placed on a tabletop,for example, it may be assumed that no sources are located below thedevice. In any case, the projections of directions d1 and d2 on the x-yplane are the same.

While a mapping of 1-D estimates θ_(x) and θ_(y) to a range of 360degrees (e.g., as in expression (1) or (2)) may produce an appropriateDOA indication when the source is located in the microphone plane, itmay produce an inaccurate result for the more general case of a sourcethat is not located in that plane. For a case in which θ_(x)=θ_(y) asshown in FIG. 41B, for example, it may be understood that thecorresponding direction in the x-y plane is 45 degrees relative to the xaxis. Applying the mapping of expression (1) to the values (θ_(x),θ_(y))=(30°, 30°), however, produces a combined estimate θ_(c) of 30degrees relative to the x axis, which does not correspond to the sourcedirection as projected on the plane.

FIG. 43B shows another example of a 2-D microphone array whose axesdefine an x-y plane and a source that is located above the x-y plane(e.g., a speakerphone application in which the speaker's mouth is abovethe tabletop). With respect to the x-y plane, the source is locatedalong the y axis (e.g., at an angle of 90 degrees relative to the xaxis). The x-axis pair MC10, MC20 indicates a DOA of zero degreesrelative to the y-z plane (i.e., broadside to the pair axis), whichagrees with the source direction as projected onto the x-y plane.Although the source is located directly above the y axis, it is alsooffset in the direction of the z axis by an elevation angle of 30degrees. This elevation of the source from the x-y plane causes they-axis pair MC20, MC30 to indicate a DOA of sixty degrees (i.e.,relative to the x-z plane) rather than ninety degrees. Applying themapping of expression (1) to the values (θ_(x), θ_(y))=(0°, 60°)produces a combined estimate θ_(c) of 60 degrees relative to the x axis,which does not correspond to the source direction as projected on theplane.

In a typical use case, the source will be located in a direction that isneither within a plane defined by the array axes nor directly above anarray axis. FIG. 43C shows an example of such a general case in which apoint source (i.e., a speaker's mouth) is elevated above the planedefined by the array axes. In order to obtain a correct indication inthe array plane of a source direction that is outside that plane, it maybe desirable to implement task TB200 to convert the 1-D DOA estimatesinto an angle in the array plane to obtain a corresponding DOA estimatein the plane.

FIGS. 44A-44D show a derivation of such a conversion of (θ_(x), θ_(y))into an angle in the array plane. In FIGS. 44A and 44B, the sourcevector d is projected onto the x axis and onto the y axis, respectively.The lengths of these projections (d sin θ_(x) and d sin θ_(y),respectively) are the dimensions of the projection p of source vector donto the x-y plane, as shown in FIG. 44C. These dimensions aresufficient to determine conversions of DOA estimates (θ_(x), θ_(y)) intoangles ({circumflex over (θ)}_(x), {circumflex over (θ)}_(y)) of p inthe x-y plane relative to the y-axis and relative to the x-axis,respectively, as shown in FIG. 44D:

$\begin{matrix}{{{\hat{\theta}}_{x} = {\tan^{- 1}( \frac{\sin \; \theta_{x}}{{{\sin \; \theta_{y}}} + ɛ} )}},{{\hat{\theta}}_{y} = {\tan^{- 1}( \frac{\sin \; \theta_{y}}{{{\sin \; \theta_{x}}} + ɛ} )}}} & (3)\end{matrix}$

where ε is a small value as may be included to avoid a divide-by-zeroerror. (It is noted with reference to FIGS. 43B, 43C, 44A-E, and also46A-E as discussed below, that the relative magnitude of d as shown isonly for convenience of illustration, and that the magnitude of d shouldbe large enough relative to the dimensions of the microphone array forthe far-field assumption of planar wavefronts to remain valid.)

Task TB200 may be implemented to convert the DOA estimates according tosuch an expression into a corresponding angle in the array plane and toapply a mapping (e.g., as in expression (1) or (2)) to the convertedangle to obtain a combined DOA estimate θ_(c) in that plane. It is notedthat such an implementation of task TB200 may omit calculation of{circumflex over (θ)}_(y) (alternatively, of {circumflex over (θ)}_(x))as included in expression (3), as the value θ_(c) may be determined from{circumflex over (θ)}_(x) as combined with sign({circumflex over(θ)}_(y))=sign(θ_(y)) (e.g., as shown in expressions (1) and (2)). Forsuch a case in which the value of |{circumflex over (θ)}_(y)| is alsodesired, it may be calculated as |{circumflex over(θ)}_(y)|=90°−|{circumflex over (θ)}_(x)| (and likewise for |{circumflexover (θ)}_(x)|).

FIG. 43C shows an example in which the DOA of the source signal passesthrough the point (x, y, z)=(5, 2, 5). In this case, the DOA observed bythe x-axis microphone pair MC10-MC20 is θ_(x)=tan⁻¹(5/√{square root over(25+4)})≈42.9°, and the DOA observed by the y-axis microphone pairMC20-MC30 is θ_(y)=tan⁻¹(2/√{square root over (25+25)}) 15.8°. Usingexpression (3) to convert these angles into corresponding angles in thex-y plane produces the converted DOA estimates ({circumflex over(θ)}_(x), {circumflex over (θ)}_(y))=(21.8°, 68.2°), which correspond tothe given source location (x,y)=(5,2).

Applying expression (3) to the values (θ_(x), θ_(y))=(30°, 30°) as shownin FIG. 41B produces the converted estimates ({circumflex over (θ)}_(x),{circumflex over (θ)}_(y))=(45°, 45°), which are mapped by expression(1) to the expected value of 45 degrees relative to the x axis. Applyingexpression (3) to the values (θ_(x), θ_(y))=(0°, 60°) as shown in FIG.43B produces the converted estimates ({circumflex over (θ)}_(x),{circumflex over (θ)}_(y))=(0°, 90°), which are mapped by expression (1)to the expected value of 90 degrees relative to the x axis.

Task TB200 may be implemented to apply a conversion and mapping asdescribed above to project a DOA, as indicated by any such pair of DOAestimates from a 2-D orthogonal array, onto the plane in which the arrayis located. Such projection may be used to enable tracking directions ofactive speakers over a 360° range around the microphone array,regardless of height difference. FIG. 45A shows a plot obtained byapplying an alternate mapping

$\theta_{c} = \{ \begin{matrix}{{- \theta_{y}},} & {\theta_{x} < 0} \\{{\theta_{y} + {180{^\circ}}},} & {otherwise}\end{matrix} $

to the converted estimates ({circumflex over (θ)}_(x), {circumflex over(θ)}_(y))=(0°, 90°) from FIG. 43B to obtain a combined directionalestimate (e.g., an azimuth) of 270 degrees. In this figure, the labelson the concentric circles indicate relative magnitude in decibels.

Task TB200 may also be implemented to include a validity check on theobserved DOA estimates prior to calculation of the combined DOAestimate. It may be desirable, for example, to verify that the value(|θ_(x)|+|θ_(y)|) is at least equal to 90 degrees (e.g., to verify thatthe cones of confusion associated with the two observed estimates willintersect along at least one line).

In fact, the information provided by such DOA estimates from a 2Dmicrophone array is nearly complete in three dimensions, except for theup-down confusion. For example, the directions of arrival observed bymicrophone pairs MC10-MC20 and MC20-MC30 may also be used to estimatethe magnitude of the angle of elevation of the source relative to thex-y plane. If d denotes the vector from microphone MC20 to the source,then the lengths of the projections of vector d onto the x-axis, they-axis, and the x-y plane may be expressed as d sin(θ_(x)), dsin(θ_(y)), and d√{square root over (sin²(θ_(x))+sin²(θ_(y)))}{squareroot over (sin²(θ_(x))+sin²(θ_(y)))}, respectively (e.g., as shown inFIGS. 44A-44E). The magnitude of the angle of elevation may then beestimated as θ_(h)=cos⁻¹√{square root over(sin²(θ_(x))+sin²(θ_(y)))}{square root over (sin²(θ_(x))+sin²(θ_(y)))}.

Although the linear microphone arrays in some particular examples haveorthogonal axes, it may be desirable to implement method M200 for a moregeneral case in which the axes of the microphone arrays are notorthogonal. FIG. 45B shows an example of the intersecting cones ofconfusion associated with the responses of linear microphone arrayshaving non-orthogonal axes x and r to a common point source. FIG. 45Cshows the lines of intersection of these cones, which define the twopossible directions d1 and d2 of the point source with respect to thearray axes in three dimensions.

FIG. 46A shows an example of a microphone array MC10-MC20-MC30 in whichthe axis of pair MC10-MC20 is the x axis, and the axis r of pairMC20-MC30 lies in the x-y plane and is skewed relative to the y axis bya skew angle α. FIG. 46B shows an example of obtaining a combineddirectional estimate in the x-y plane with respect to orthogonal axes xand y with observations (θ_(x), θ_(r)) from an array as shown in FIG.46A. If d denotes the vector from microphone MC20 to the source, thenthe lengths of the projections of vector d onto the x-axis (d_(x)) andonto the axis r (d_(r)) may be expressed as d sin(θ_(x)) and dsin(θ_(r)), respectively, as shown in FIGS. 46B and 46C. The vectorp=(p_(x), p_(y)) denotes the projection of vector d onto the x-y plane.The estimated value of p_(x)=d sin θ_(x) is known, and it remains todetermine the value of p_(y).

We assume that the value of α is in the range (−90°, +90°), as an arrayhaving any other value of a may easily be mapped to such a case. Thevalue of p_(y) may be determined from the dimensions of the projectionvector d_(r)=(d sin θ_(r) sin α, d sin θ_(r) cos α) as shown in FIGS.46D and 46E. Observing that the difference between vector p and vectord_(r) is orthogonal to d_(r) (i.e., that the inner product

(p−d_(r)), d_(r)

is equal to zero), we calculate p_(y) as

$p_{y} = {d\; \frac{{\sin \; \theta_{r}} - {\sin \; \theta_{x}\sin \; \alpha}}{\cos \; \alpha}}$

(which reduces to p_(y)=d sin θ_(r) for α=0). The desired angles ofarrival in the x-y plane, relative to the orthogonal x and y axes, maythen be expressed respectively as

$\begin{matrix}{( {{\hat{\theta}}_{x},{\hat{\theta}}_{y}} ) = {( {{\tan^{- 1}( \frac{\sin \; \theta_{x}\cos \; \alpha}{{{{\sin \; \theta_{r}} - {\sin \; \theta_{x}\sin \; \alpha}}} + ɛ} )},{\tan^{- 1}( \frac{{\sin \; \theta_{r}} - {\sin \; \theta_{x}\sin \; \alpha}}{{{{\sin \; \theta_{x}}}\cos \; \alpha} + ɛ} )}} ).}} & (4)\end{matrix}$

It is noted that expression (3) is a special case of expression (4) inwhich α=0. The dimensions (p_(x),p_(y)) of projection p may also be usedto estimate the angle of elevation θ_(h) of the source relative to thex-y plane (e.g., in a similar manner as described above with referenceto FIG. 44E).

FIG. 47A shows a flowchart of a method M300 according to a generalconfiguration that includes instances of tasks TB 100 a and TB 100 b.Method M300 also includes an implementation TB300 of task TB200 thatcalculates a projection of the direction of arrival into a plane thatdoes not include the direction of arrival (e.g., a plane defined by thearray axes). In such manner, a 2-D array may be used to extend the rangeof source DOA estimation from a linear, 180-degree estimate to a planar,360-degree estimate. FIG. 47C illustrates one example of an apparatusA300 with components (e.g., a first DOA estimator B100 a, a second DOAestimator B100 b and a projection calculator B300) for performingfunctions corresponding to FIG. 47A. FIG. 47D illustrates one example ofan apparatus MF300 including means (e.g., means FB100 a for calculatinga first DOA estimate with respect to an axis of a first array, meansFB100 b for calculating a second DOA estimate with respect to an axis ofa second array and means FB300 for calculating a projection of a DOAonto a plane that does not include the DOA) for performing functionscorresponding to FIG. 47A.

FIG. 47B shows a flowchart of an implementation TB302 of task TB300 thatincludes subtasks TB310 and TB320. Task TB310 converts the first DOAestimate (e.g., θ_(x)) to an angle in the projection plane (e.g.,{circumflex over (θ)}_(x)). For example, task TB310 may perform aconversion as shown in, e.g., expression (3) or (4). Task TB320 combinesthe converted angle with information (e.g., sign information) from thesecond DOA estimate to obtain the projection of the direction ofarrival. For example, task TB320 may perform a mapping according to,e.g., expression (1) or (2).

As described above, extension of source DOA estimation to two dimensionsmay also include estimation of the angle of elevation of the DOA over arange of 90 degrees (e.g., to provide a measurement range that describesa hemisphere over the array plane). FIG. 48A shows a flowchart of suchan implementation M320 of method M300 that includes a task TB400. TaskTB400 calculates an estimate of the angle of elevation of the DOA withreference to a plane that includes the array axes (e.g., as describedherein with reference to FIG. 44E). Method M320 may also be implementedto combine the projected DOA estimate with the estimated angle ofelevation to produce a three-dimensional vector.

It may be desirable to perform an implementation of method M300 withinan audio sensing device that has a 2-D array including two or morelinear microphone arrays. Examples of a portable audio sensing devicethat may be implemented to include such a 2-D array and may be used toperform such a method for audio recording and/or voice communicationsapplications include a telephone handset (e.g., a cellular telephonehandset); a wired or wireless headset (e.g., a Bluetooth headset); ahandheld audio and/or video recorder; a personal media player configuredto record audio and/or video content; a personal digital assistant (PDA)or other handheld computing device; and a notebook computer, laptopcomputer, netbook computer, tablet computer, or other portable computingdevice. The class of portable computing devices currently includesdevices having names such as laptop computers, notebook computers,netbook computers, ultra-portable computers, tablet computers, mobileInternet devices, smartbooks, and smartphones. Such a device may have atop panel that includes a display screen and a bottom panel that mayinclude a keyboard, wherein the two panels may be connected in aclamshell or other hinged relationship. Such a device may be similarlyimplemented as a tablet computer that includes a touchscreen display ona top surface.

Extension of DOA estimation to a 2-D array (e.g., as described hereinwith reference to implementations of method M200 and implementations ofmethod M300) is typically well-suited to and sufficient for aspeakerphone application. However, further extension of such principlesto an N-dimensional array (wherein N>=2) is also possible and may beperformed in a straightforward manner. For example, FIGS. 41A-46Eillustrate use of observed DOA estimates from different microphone pairsin an x-y plane to obtain an estimate of a source direction as projectedinto the x-y plane. In the same manner, an instance of method M200 orM300 may be implemented to combine observed DOA estimates from an x-axismicrophone pair and a z-axis microphone pair (or other pairs in the x-zplane) to obtain an estimate of the source direction as projected intothe x-z plane, and likewise for the y-z plane or any other plane thatintersects three or more of the microphones. The 2-D projected estimatesmay then be combined to obtain the estimated DOA in three dimensions.For example, a DOA estimate for a source as projected onto the x-y planemay be combined with a DOA estimate for the source as projected onto thex-z plane to obtain a combined DOA estimate as a vector in (x, y, z)space.

For tracking applications in which one target is dominant, it may bedesirable to select N linear microphone arrays (e.g., pairs) forrepresenting N respective dimensions. Method M200 or M300 may beimplemented to combine a 2-D result, obtained with a particular pair ofsuch linear arrays, with a DOA estimate from each of one or more lineararrays in other planes to provide additional degrees of freedom.

Estimates of DOA error from different dimensions may be used to obtain acombined likelihood estimate, for example, using an expression such as

${\frac{1}{{\max ( {{{\theta - \theta_{0,1}}}_{f,1}^{2},{{\theta - \theta_{0,2}}}_{f,2}^{2}} )} + \lambda}\mspace{14mu} {or}\mspace{14mu} \frac{1}{{{mean}( {{{\theta - \theta_{0,1}}}_{f,1}^{2},{{\theta - \theta_{f,2}^{2}}}} )} + \lambda}},$

where θ_(0,i) denotes the DOA candidate selected for pair i. Use of themaximum among the different errors may be desirable to promote selectionof an estimate that is close to the cones of confusion of bothobservations, in preference to an estimate that is close to only one ofthe cones of confusion and may thus indicate a false peak. Such acombined result may be used to obtain a (frame, angle) plane, as shownin FIG. 8 and described herein, and/or a (frame, frequency) plot, asshown at the bottom of FIG. 9 and described herein.

FIG. 48B shows a flowchart for an implementation M325 of method M320that includes tasks TB100 c and an implementation TB410 of task T400.Task TB100 c calculates a third estimate of the direction of arrivalwith respect to an axis of a third microphone array. Task TB410estimates the angle of elevation based on information from the DOAestimates from tasks TB100 a, TB100 b, and TB100 c.

It is expressly noted that methods M200 and M300 may be implemented suchthat task TB100 a calculates its DOA estimate based on one type ofdifference between the corresponding microphone channels (e.g., aphase-based difference), and task TB100 b (or TB100 c) calculates itsDOA estimate based on another type of difference between thecorresponding microphone channels (e.g., a gain-based difference). Inone application of such an example of method M325, an array that definesan x-y plane is expanded to include a front-back pair (e.g., a fourthmicrophone located at an offset along the z axis with respect tomicrophone MC10, MC20, or MC30). The DOA estimate produced by task TB100c for this pair is used in task TB400 to resolve the front-backambiguity in the angle of elevation, such that the method provides afull spherical measurement range (e.g., 360 degrees in any plane). Inthis case, method M325 may be implemented such that the DOA estimatesproduced by tasks TB100 a and TB100 b are based on phase differences,and the DOA estimate produced by task TB100 c is based on gaindifferences. In a particular example (e.g., for tracking of only onesource), the DOA estimate produced by task TB100 c has two states: afirst state indicating that the source is above the plane, and a secondstate indicating that the source is below the plane.

FIG. 49A shows a flowchart of an implementation M330 of method M300.Method M330 includes a task TB500 that displays the calculatedprojection to a user of the audio sensing device. Task TB500 may beconfigured, for example, to display the calculated projection on adisplay screen of the device in the form of a polar plot (e.g., as shownin FIG. 41C, 42D, and 45A). Examples of such a display screen, which maybe a touchscreen as shown in FIG. 1, include a liquid crystal display(LCD), an organic light-emitting diode (OLED) display, an electrowettingdisplay, an electrophoretic display, and an interferometric modulatordisplay. Such display may also include an indication of the estimatedangle of elevation (e.g., as shown in FIG. 49B).

Task TB500 may be implemented to display the projected DOA with respectto a reference direction of the device (e.g., a principal axis of thedevice). In such case, the direction as indicated will change as thedevice is rotated relative to a stationary source, even if the positionof the source does not change. FIGS. 50A and 50B show examples of such adisplay before and after such rotation, respectively.

Alternatively, it may be desirable to implement task TB500 to displaythe projected DOA relative to an external reference direction, such thatthe direction as indicated remains constant as the device is rotatedrelative to a stationary source. FIGS. 51A and 51B show examples of sucha display before and after such rotation, respectively.

To support such an implementation of task TB500, device D100 may beconfigured to include an orientation sensor (not shown) that indicates acurrent spatial orientation of the device with reference to an externalreference direction, such as a gravitational axis (e.g., an axis that isnormal to the earth's surface) or a magnetic axis (e.g., the earth'smagnetic axis). The orientation sensor may include one or more inertialsensors, such as gyroscopes and/or accelerometers. A gyroscope usesprinciples of angular momentum to detect changes in orientation about anaxis or about each of two or three (typically orthogonal) axes (e.g.,changes in pitch, roll and/or twist). Examples of gyroscopes, which maybe fabricated as micro-electromechanical systems (MEMS) devices, includevibratory gyroscopes. An accelerometer detects acceleration along anaxis or along each of two or three (typically orthogonal) axes. Anaccelerometer may also be fabricated as a MEMS device. It is alsopossible to combine a gyroscope and an accelerometer into a singlesensor. Additionally or alternatively, the orientation sensor mayinclude one or more magnetic field sensors (e.g., magnetometers), whichmeasure magnetic field strength along an axis or along each of two orthree (typically orthogonal) axes. In one example, device D100 includesa magnetic field sensor that indicates a current orientation of thedevice relative to a magnetic axis (e.g., of the earth). In such case,task TB500 may be implemented to display the projected DOA on a gridthat is rotated into alignment with that axis (e.g., as a compass).

FIG. 49C shows a flowchart of such an implementation M340 of method M330that includes a task TB600 and an implementation TB510 of task TB500.Task TB600 determines an orientation of the audio sensing device withreference to an external reference axis (e.g., a gravitational ormagnetic axis). Task TB510 displays the calculated projection based onthe determined orientation.

Task TB500 may be implemented to display the DOA as the angle projectedonto the array plane. For many portable audio sensing devices, themicrophones used for DOA estimation will be located at the same surfaceof the device as the display (e.g., microphones ME10, MV10-1, and MV10-3in FIG. 1) or much closer to that surface than to each other (e.g.,microphones ME10, MR10, and MV10-3 in FIG. 1). The thickness of a tabletcomputer or smartphone, for example, is typically small relative to thedimensions of the display surface. In such cases, any error between theDOA as projected onto the array plane and the DOA as projected onto thedisplay plane may be expected to be negligible, and it may be acceptableto configure task TB500 to display the DOA as projected onto the arrayplane.

For a case in which the display plane differs noticeably from the arrayplane, task TB500 may be implemented to project the estimated DOA from aplane defined by the axes of the microphone arrays into a plane of adisplay surface. For example, such an implementation of task TB500 maydisplay a result of applying a projection matrix to the estimated DOA,where the projection matrix describes a projection from the array planeonto a surface plane of the display. Alternatively, task TB300 may beimplemented to include such a projection.

As described above, the audio sensing device may include an orientationsensor that indicates a current spatial orientation of the device withreference to an external reference direction. It may be desirable tocombine a DOA estimate as described herein with such orientationinformation to indicate the DOA estimate with reference to the externalreference direction. FIG. 53B shows a flowchart of such animplementation M350 of method M300 that includes an instance of taskTB600 and an implementation TB310 of task TB300. Method M350 may also beimplemented to include an instance of display task TB500 as describedherein.

FIG. 52A shows an example in which the device coordinate system E isaligned with the world coordinate system. FIG. 52A also shows a deviceorientation matrix F that corresponds to this orientation (e.g., asindicated by the orientation sensor). FIG. 52B shows an example in whichthe device is rotated (e.g., for use in browse-talk mode) and the matrixF (e.g., as indicated by the orientation sensor) that corresponds tothis new orientation.

Task TB310 may be implemented to use the device orientation matrix F toproject the DOA estimate into any plane that is defined with referenceto the world coordinate system. In one such example, the DOA estimate isa vector g in the device coordinate system. In a first operation, vectorg is converted into a vector h in the world coordinate system by aninner product with device orientation matrix F. Such a conversion may beperformed, for example, according to an expression such as {right arrowover (h)}=({right arrow over (g)}^(T)E)^(T)F. In a second operation, thevector h is projected into a plane P that is defined with reference tothe world coordinate system by the projection A (A^(T)A)⁻¹A^(T){rightarrow over (h)}, where A is a basis matrix of the plane P in the worldcoordinate system.

In a typical example, the plane P is parallel to the x-y plane of theworld coordinate system (i.e., the “world reference plane”). FIG. 52Cshows a perspective mapping, onto a display plane of the device, of aprojection of a DOA onto the world reference plane as may be performedby task TB500, where the orientation of the display plane relative tothe world reference plane is indicated by the device orientation matrixF. FIG. 53A shows an example of such a mapped display of the DOA asprojected onto the world reference plane.

In another example, task TB310 is configured to project DOA estimatevector g into plane P using a less complex interpolation among componentvectors of g that are projected into plane P. In this case, theprojected DOA estimate vector P_(g) may be calculated according to anexpression such as

p _(g) =αg _(x-y(p)) +βg _(x-z(p)) +γg _(x-z(p)),

where [{right arrow over (e)}_(x) {right arrow over (e)}_(y) {rightarrow over (e)}_(z)] denote the basis vectors of the device coordinatesystem; g=g_(x){right arrow over (e)}_(x)+g_(y){right arrow over(e)}_(y)+g_(z){right arrow over (e)}_(z); θ_(α), θ_(β), θ_(γ) denote theangles between plane P and the planes spanned by [{right arrow over(e)}_(x) {right arrow over (e)}_(y)], [{right arrow over (e)}_(x) {rightarrow over (e)}_(z)], [{right arrow over (e)}_(y) {right arrow over(e)}_(z)], respectively, and α, β, γ denote their respective cosines(α²+β²+γ²=1); and g_(x-y(p)), g_(x-z(p)), g_(y-z(p)) denote theprojections into plane P of the component vectors g_(x-y), g_(x-z),g_(y-z)=[g_(x){right arrow over (e)}_(x)g_(y){right arrow over (e)}_(y)0]^(T), [g_(x){right arrow over (e)}_(x) 0 g_(z){right arrow over(e)}_(z)]^(T), [0 g_(y){right arrow over (e)}_(y)g_(z){right arrow over(e)}_(z)]^(T), respectively. The plane corresponding to the minimumamong α, β, and γ is the plane that is closest to P, and an alternativeimplementation of task TB310 identifies this minimum and produces thecorresponding one of the projected component vectors as an approximationof P_(g).

It may be desirable to configure an audio sensing device to discriminateamong source signals having different DOAs. For example, it may bedesirable to configure the audio sensing device to perform adirectionally selective filtering operation on the multichannel signalto pass directional components that arrive from directions within anangular pass range and/or to block or otherwise attenuate directionalcomponents that arrive from directions within an angular stop range.

It may be desirable to use a display as described herein to support agraphical user interface to enable a user of an audio sensing device toconfigure a directionally selective processing operation (e.g., abeamforming operation as described herein). FIG. 54A shows an example ofsuch a user interface, in which the unshaded portion of the circleindicates a range of directions to be passed and the shaded portionindicates a range of directions to be blocked. The circles indicatepoints on a touch screen that the user may slide around the periphery ofthe circle to change the selected range. The touch points may be linkedsuch that moving one causes the other to move by an equal angle in thesame angular direction or, alternatively, in the opposite angulardirection. Alternatively, the touch points may be independentlyselectable (e.g., as shown in FIG. 54B). It is also possible to provideone or more additional pairs of touch points to support selection ofmore than one angular range (e.g., as shown in FIG. 54C).

As alternatives to touch points as shown in FIGS. 54A-C, the userinterface may include other physical or virtual selection interfaces(e.g., clickable or touchable icons on a screen) to obtain user inputfor selection of pass/stop band location and/or width. Examples of suchinterfaces include a linear slider potentiometer, a rocker switch (forbinary input to indicate, e.g., up-down, left-right,clockwise/counter-clockwise), and a wheel or knob as shown in FIG. 53C.

For use cases in which the audio sensing device is expected to remainstationary during use (e.g., the device is placed on a flat surface forspeakerphone use), it may be sufficient to indicate a range of selecteddirections that is fixed relative to the device. If the orientation ofthe device relative to a desired source changes during use, however,components arriving from the direction of that source may no longer beadmitted. FIGS. 55A and 55B show a further example in which anorientation sensor is used to track an orientation of the device. Inthis case, a directional displacement of the device (e.g., as indicatedby the orientation sensor) is used to update the directional filteringconfiguration as selected by the user (and to update the correspondingdisplay) such that the desired directional response may be maintaineddespite a change in orientation of the device.

It may be desirable for the array to include a number of microphonesthat is at least equal to the number of different source directions tobe distinguished (e.g., the number of beams to be formed) at any onetime. The microphones may be omnidirectional (e.g., as may be typicalfor a cellular telephone or a dedicated conferencing device) ordirectional (e.g., as may be typical for a device such as a set-topbox).

The DOA estimation principles described herein may be used to supportselection among multiple speakers. For example, location of multiplesources may be combined with a manual selection of a particular speaker(e.g., push a particular button, or touch a particular screen area, toselect a particular corresponding speaker or active source direction) orautomatic selection of a particular speaker (e.g., by speakerrecognition). In one such application, an audio sensing device (e.g., atelephone) is configured to recognize the voice of its owner and toautomatically select a direction corresponding to that voice inpreference to the directions of other sources.

B. Systems and Methods for Mapping a Source Location

It should be noted that one or more of the functions, apparatuses,methods and/or algorithms described above may be implemented inaccordance with the systems and methods disclosed herein. Someconfigurations of the systems and methods disclosed herein describemulti-modal sensor fusion for seamless audio processing. For instance,the systems and methods described herein enable projecting multiple DOAinformation from 3D sound sources captured by microphones into aphysical 2D plane using sensor data and a set of microphones located ona 3D device, where the microphone signals may be selected based on theDOA information retrieved from the microphones that maximize the spatialresolution of sound sources in a 2D physical plane and where the sensordata provides a reference of the orientation of 3D device with respectto the physical 2D plane. There are many use cases that may benefit fromthe fusion of sensors such as an accelerometer, proximity sensor, etc.,with multi-microphones. One example (e.g., “use case 1”) may include arobust handset intelligent switch (IS). Another example (e.g., “use case2”) may include robust support for various speakerphone holdingpatterns. Another example (e.g., “use case 3”) may include seamlessspeakerphone-handset holding pattern support. Yet another example (e.g.,“use case 4”) may include a multi-view visualization of active sourceand coordination passing.

Some configurations of the systems and methods disclosed herein mayinclude at least one statistical model for discriminating desired usecases with pre-obtainable sensor data, if necessary. Available sensordata may be tracked along with multi-microphone data, and may beutilized for at least one of the use cases. Some configurations of thesystems and methods disclosed herein may additionally or alternativelytrack sensor data along with other sensor data (e.g., camera data) forat least one use case.

Various configurations are now described with reference to the Figures,where like reference numbers may indicate functionally similar elements.The systems and methods as generally described and illustrated in theFigures herein could be arranged and designed in a wide variety ofdifferent configurations. Thus, the following more detailed descriptionof several configurations, as represented in the Figures, is notintended to limit scope, as claimed, but is merely representative of thesystems and methods. Features and/or elements depicted in a Figure maybe combined with at least one features and/or elements depicted in atleast one other Figures.

FIG. 56 is a block diagram illustrating one configuration of anelectronic device 5602 in which systems and methods for mapping a sourcelocation may be implemented. The systems and methods disclosed hereinmay be applied to a variety of electronic devices 5602. Examples ofelectronic devices 5602 include cellular phones, smartphones, voicerecorders, video cameras, audio players (e.g., Moving Picture ExpertsGroup-1 (MPEG-1) or MPEG-2 Audio Layer 3 (MP3) players), video players,audio recorders, desktop computers, laptop computers, personal digitalassistants (PDAs), gaming systems, etc. One kind of electronic device5602 is a communication device, which may communicate with anotherdevice. Examples of communication devices include telephones, laptopcomputers, desktop computers, cellular phones, smartphones, wireless orwired modems, e-readers, tablet devices, gaming systems, cellulartelephone base stations or nodes, access points, wireless gateways andwireless routers, etc.

An electronic device 5602 (e.g., communication device) may operate inaccordance with certain industry standards, such as InternationalTelecommunication Union (ITU) standards and/or Institute of Electricaland Electronics Engineers (IEEE) standards (e.g., 802.11 WirelessFidelity or “Wi-Fi” standards such as 802.11a, 802.11b, 802.11g,802.11n, 802.11ac, etc.). Other examples of standards that acommunication device may comply with include IEEE 802.16 (e.g.,Worldwide Interoperability for Microwave Access or “WiMAX”), 3GPP, 3GPPLTE, 3rd Generation Partnership Project 2 (3GPP2), GSM and others (wherea communication device may be referred to as a User Equipment (UE),NodeB, evolved NodeB (eNB), mobile device, mobile station, subscriberstation, remote station, access terminal, mobile terminal, terminal,user terminal and/or subscriber unit, etc., for example). While some ofthe systems and methods disclosed herein may be described in terms of atleast one standard, this should not limit the scope of the disclosure,as the systems and methods may be applicable to many systems and/orstandards.

The electronic device 5602 may include at least one sensor 5604, amapper 5610 and/or an operation block/module 5614. As used herein, thephrase “block/module” indicates that a particular component may beimplemented in hardware (e.g., circuitry), software or a combination ofboth. For example, the operation block/module 5614 may be implementedwith hardware components such as circuitry and/or software componentssuch as instructions or code, etc. Additionally, one or more of thecomponents or elements of the electronic device 5602 may be implementedin hardware (e.g., circuitry), software, firmware or any combinationthereof. For example, the mapper 5610 may be implemented in circuitry(e.g., in an Application-Specific Integrated Circuit (ASIC), aField-Programmable Gate Array (FPGA) and/or one or more processors,etc.).

The at least one sensor 5604 may collect data relating to the electronicdevice 5602. The at least one sensor 5604 may be included in and/orcoupled to the electronic device 5602. Examples of sensors 5604 includemicrophones, accelerometers, gyroscopes, compasses, infrared sensors,tilt sensors, global positioning system (GPS) receivers, proximitysensors, cameras, ultrasound sensors, etc. In some implementations, theat least one sensor 5604 may provide sensor data 5608 to the mapper5610. Examples of sensor data 5608 include audio signals, accelerometerreadings, gyroscope readings, position information, orientationinformation, location information, proximity information (e.g., whetheran object is detected close to the electronic device 5602), images, etc.

In some configurations (described in greater detail below), the mapper5610 may use the sensor data 5608 to improve audio processing. Forexample, a user may hold the electronic device 5602 (e.g., a phone) indifferent orientations for speakerphone usage (e.g., portrait, landscapeor even desktop hands-free). Depending on the holding pattern (e.g., theelectronic device 5602 orientation), the electronic device 5602 mayselect appropriate microphone configurations (including a singlemicrophone configuration) to improve spatial audio processing. By addingaccelerometer/proximity sensor data 5608, the electronic device 5602 maymake the switch seamlessly.

The sensors 5604 (e.g., a multiple microphones) may receive one or moreaudio signals (e.g., a multi-channel audio signal). In someimplementations, microphones may be located at various locations of theelectronic device 5602, depending on the configuration. For example,microphones may be positioned on the front, sides and/or back of theelectronic device 5602 as illustrated above in FIG. 1. Additionally oralternatively, microphones may be positioned near the top and/or bottomof the electronic device 5602. In some cases, the microphones may beconfigured to be disabled (e.g., not receive an audio signal). Forexample, the electronic device 5602 may include circuitry that disablesat least one microphone in some cases. In some implementations, one ormore microphones may be disabled based on the electronic device 5602orientation. For example, if the electronic device 5602 is in ahorizontal face-up orientation on a surface (e.g., a tabletop mode), theelectronic device 5602 may disable at least one microphone located onthe back of the electronic device 5602. Similarly, if the electronicdevice 5602 orientation changes (by a large amount for example), theelectronic device 5602 may disable at least one microphone.

A few examples of various microphone configurations are given asfollows. In one example, the electronic device 5602 may be designed touse a dual-microphone configuration when possible. Unless the user holdsthe electronic device 5602 (e.g., phone) in such a way that a normalvector to the display is parallel, or nearly parallel with the ground(e.g., the electronic device 5602 appears to be vertically oriented(which can be determined based on sensor data 5608)), the electronicdevice 5602 may use a dual-microphone configuration in a category Aconfiguration. In some implementations, in the category A configuration,the electronic device 5602 may include a dual microphone configurationwhere one microphone may be located near the back-top of the electronicdevice 5602, and the other microphone may be located near thefront-bottom of the electronic device 5602. In this configuration, theelectronic device 5602 may be capable of discriminating audio signalsources (e.g., determining the direction of arrival of the audiosignals) in a plane that contains a line formed by the locations of themicrophones. Based on this configuration, the electronic device 5602 maybe capable of discriminating audio signal sources in 180 degrees.Accordingly, the direction of arrival of the audio signals that arrivewithin the 180 degree span may be discriminated based on the twomicrophones in the category A configuration. For example, an audiosignal received from the left, and an audio signal received from theright of the display of the electronic device 5602 may be discerned. Thedirectionality of one or more audio signals may be determined asdescribed in section A above in some configurations.

In another example, unless the user holds the electronic device 5602(e.g., phone) in such a way that a normal vector to the display isperpendicular, or nearly perpendicular with the ground (e.g., theelectronic device 5602 appears to be horizontally oriented (which can beinformed by sensor data 5608)), the electronic device 5602 may use adual-microphone configuration, with a category B configuration. In thisconfiguration, the electronic device 5602 may include a dual microphoneconfiguration where one microphone may be located near the back-bottomof the electronic device 5602, and the other microphone may be locatednear the front-bottom of the electronic device 5602. In someimplementations, in the category B configuration, one microphone may belocated near the back-top of the electronic device 5602, and the othermicrophone may be located near the front top of the electronic device5602.

In the category B configuration, audio signals may be discriminated(e.g., the direction of arrival of the audio signals may be determined)in a plane that contains a line formed by the locations of themicrophones. Based on this configuration, there may be 180 degree audiosource discrimination. Accordingly, the direction of arrival of theaudio signals that arrive within the 180 degree span may bediscriminated based on the two microphones in the category Bconfiguration. For example, an audio signal received from the top, andan audio signal received from the bottom of the display of theelectronic device 5602 may be discerned. However, two audio signals thatare on the left or right of the display of the electronic device 5602may not be discerned. It should be noted that if the electronic deviceorientation 102 were changed, such that the electronic device 5602 werevertically oriented, instead of horizontally oriented, the audio signalsfrom the left and right of the display of the electronic device may bediscerned. For a three-microphone configuration, category C, theelectronic device 5602 may use a front-back pair of microphones for thevertical orientations and may use a top-bottom pair of microphones forhorizontal orientations. Using a configuration as in category C, theelectronic device 5602 may be capable of discriminating audio signalsources (e.g., discriminating the direction of arrival from differentaudio signals) in 360 degrees.

The mapper 5610 may determine a mapping 5612 of a source location toelectronic device 5602 coordinates and from the electronic device 5602coordinates to physical coordinates (e.g., a two-dimensional planecorresponding to real-world or earth coordinates) based on the sensordata 5608. The mapping 5612 may include data that indicates mappings(e.g., projections) of a source location to electronic devicecoordinates and/or to physical coordinates. For example, the mapper 5610may implement at least one algorithm to map the source location tophysical coordinates. In some implementations, the physical coordinatesmay be two-dimensional physical coordinates. For example, the mapper5610 my use sensor data 5608 from the at least one sensor 5604 (e.g.,integrated accelerometer, proximity and microphone data) to determine anelectronic device 5602 orientation (e.g., holding pattern) and to directthe electronic device 5602 to perform an operation (e.g., display asource location, switch microphone configurations and/or configure noisesuppression settings).

The mapper 5610 may detect change in electronic device 5602 orientation.In some implementations, electronic device 5602 (e.g., phone) movementsmay be detected through the sensor 5604 (e.g., an accelerometer and/or aproximity sensor). The mapper 5610 may utilize these movements, and theelectronic device 5602 may adjust microphone configurations and/or noisesuppression settings based on the extent of rotation. For example, themapper 5610 may receive sensor data 5608 from the at least one sensor5604 that indicates that the electronic device 5602 has changed from ahorizontal orientation, (e.g., a tabletop mode) to a verticalorientation (e.g., a browse-talk mode). In some implementations, themapper 5610 may indicate that an electronic device 5602 (e.g., awireless communication device) has changed orientation from a handsetmode (e.g., the side of a user's head) to a browse-talk mode (e.g., infront of a user at eye-level).

The electronic device may also include an operation block/module 5614that performs at least one operation based on the mapping 5612. Forexample, the operation block/module 5614 may be coupled to the at leastone microphone and may switch the microphone configuration based on themapping 5612. For example, if the mapping 5612 indicates that theelectronic device 5602 has changed from a vertical orientation (e.g., abrowse-talk mode) to a horizontal face-up orientation on a flat surface(e.g., a tabletop mode), the operation block/module 5614 may disable atleast one microphone located on the back of the electronic device.Similarly, as will be described below, the operation block/module 5614may switch from a multi-microphone configuration to a single microphoneconfiguration. Other examples of operations include tracking a source intwo or three dimensions, projecting a source into a three-dimensionaldisplay space and performing non-stationary noise suppression.

FIG. 57 is a flow diagram illustrating one configuration of a method5700 for mapping electronic device 5602 coordinates. The method 5700 maybe performed by the electronic device 5602. The electronic device 5602may obtain 5702 sensor data 5608. At least one sensor 5604 coupled tothe electronic device 5602 may provide sensor data 5608 to theelectronic device 5602. Examples of sensor data 5608 include audiosignal(s) (from one or more microphones, for example), accelerometerreadings, position information, orientation information, locationinformation, proximity information (e.g., whether an object is detectedclose to the electronic device 5602), images, etc. In someimplementations, the electronic device 5602 may obtain 5702 the sensordata 5608 (e.g., the accelerometer x-y-z coordinate) using pre-acquireddata for each designated electronic device 5602 orientation (e.g.,holding pattern) and corresponding microphone identification.

The electronic device 5602 may map 5704 a source location to electronicdevice coordinates based on the sensor data. This may be accomplished asdescribed above in connection with one or more of FIGS. 41-48. Forexample, the electronic device 5602 may estimate a direction of arrival(DOA) of a source relative to electronic device coordinates based on amultichannel signal (e.g., multiple audio signals from two or moremicrophones). In some approaches, mapping 5704 the source location toelectronic device coordinates may include projecting the direction ofarrival onto a plane (e.g., projection plane and/or array plane, etc.)as described above. In some configurations, the electronic devicecoordinates may be a microphone array plane corresponding to the device.In other configurations, the electronic device coordinates may beanother coordinate system corresponding to the electronic device 5602that a source location (e.g., DOA) may be mapped to (e.g., translatedand/or rotated) by the electronic device 5602.

The electronic device 5602 may map 5706 the source location from theelectronic device coordinates to physical coordinates (e.g.,two-dimensional physical coordinates). This may be accomplished asdescribed above in connection with one or more of FIGS. 49-53. Forexample, the electronic device may utilize an orientation matrix toproject the DOA estimate into a plane that is defined with reference tothe world (or earth) coordinate system.

In some configurations, the mapper 5610 included in the electronicdevice 5602 may implement at least one algorithm to map 5704 the sourcelocation to electronic device coordinates and to map 5706 the sourcelocation from the electronic device coordinates the electronic device5602 coordinates to physical coordinates. In some configurations, themapping 5612 may be applied to a “3D audio map.” For example, in someconfigurations, a compass (e.g., a sensor 5604) may provide compass data(e.g., sensor data 5608) to the mapper 5610. In this example, theelectronic device 5602 may obtain a sound distribution map in a four pidirection (e.g., a sphere) translated into physical (e.g., real world orearth) coordinates. This may allow the electronic device 5602 todescribe a three-dimensional audio space. This kind of elevationinformation may be utilized to reproduce elevated sound via aloudspeaker located in an elevated position (as in a 22.2 surroundsystem, for example).

In some implementations, mapping 5706 the source location from theelectronic device coordinates to physical coordinates may includedetecting an electronic device 5602 orientation and/or detecting anychange in an electronic device 5602 orientation. For example, the mapper5610 my use sensor data 5608 from the at least one sensor 5604 (e.g.,integrated accelerometer, proximity and microphone data) to determine anelectronic device 5602 orientation (e.g., holding pattern). Similarly,the mapper 5610 may receive sensor data 5608 from the at least onesensor 5604 that indicates that the electronic device 5602 has changedfrom a horizontal orientation (e.g., a tabletop mode) to a verticalorientation (e.g., a browse-talk mode).

The electronic device 5602 may perform 5708 an operation based on themapping 5612. For example, the electronic device 5602 may perform 5708at least one operation based on the electronic device 5602 orientation(e.g., as indicated by the mapping 5612). Similarly, the electronicdevice 5602 my perform 5708 an operation based on a detected change inthe electronic device 5602 orientation (e.g., as indicated by themapping 5612). Specific examples of operations include switching theelectronic device 5602 microphone configuration, tracking an audiosource (in two or three dimensions, for instance), mapping a sourcelocation from physical coordinates into a three-dimensional displayspace, non-stationary noise suppression, filtering, displaying imagesbased on audio signals, etc.

An example of mapping 5706 the source location from the electronicdevice coordinates to physical coordinates is given as follows.According to this example, the electronic device 5602 (e.g., the mapper5610) may monitor the sensor data 5608 (e.g., accelerometer coordinatedata), smooth the sensor data 5608 (simple recursive weighting or Kalmansmoothing), and the operation block/module 5614 may perform an operationbased on the mapping 5612 (e.g., mapping or projecting the audio signalsource).

The electronic device 5602 may obtain a three-dimensional (3D) spacedefined by x-y-z basis vectors E=({right arrow over (e)}_(x), {rightarrow over (e)}_(y), {right arrow over (e)}_(z)) in a coordinate systemgiven by a form factor (e.g. FLUID) (by using a gyro sensor forexample). The electronic device 5602 may also specify the basis vectorE′=({right arrow over (e)}_(x′), {right arrow over (e)}_(y′), {rightarrow over (e)}_(z′)) in the physical (e.g., real word) coordinatesystem based on the x-y-z position sensor data 5608. The electronicdevice 5602 may then obtain A=({right arrow over (e)}_(x″), {right arrowover (e)}_(y″)), which is a basis vector space to obtain anytwo-dimensional plane in the coordinate system. Given the search grid{right arrow over (g)}=(x, y, z), the electronic device 5602 may projectthe basis vector space down to the plane (x″, y″) by taking the firsttwo elements of the projection operation defined by taking the first twoelements (x″, y″), where (x″, y″)=A(A^(T)A)⁻¹A^(T)({right arrow over(g)}·E′).

For example, assuming that a device (e.g., phone) is held in browse-talkmode, then E=([1 0 0]^(T), [0 1 0]^(T), [0 0 1]^(T)) and E′=([0 01]^(T), [0 1 0]^(T), [1 0 0]^(T). Then, {right arrow over (g)}=[0 01]^(T) in a device (e.g., phone) coordinate system and ({right arrowover (g)}^(T)E)^(T) E′=[1 0 0]^(T). In order to project it down to A=([10 0]^(T), [0 1 0]^(T)), which is the real x-y plane (e.g., physicalcoordinates), A(A^(T)A)⁻¹A^(T)(({right arrow over (g)}^(T)E)^(T)E′)=[1 00]^(T). It should be noted that the first two elements [1 0]^(T) may betaken after the projection operation. Accordingly, {right arrow over(g)} in E may now be projected onto A as [1 0]^(T). Thus, [0 0 1]^(T)with a browse-talk mode in device (e.g., phone) x-y-z geometrycorresponds to [1 0]^(T) for the real world x-y plane.

For a less complex approximation for projection, the electronic device5602 may apply a simple interpolation scheme among three setrepresentations defined as P(x′, y′)=αP_(x-y)(x′, y′)+βP_(x-z)(x′,y′)+γP_(y-z)(x′, y′), where α+β+γ=1 and that is a function of the anglebetween the real x-y plane and each set plane. Alternatively, theelectronic device 5602 may use the representation given by P(x′,y′)=min(P_(x-y(x′,y′;)), P_(x-z(Z′,y′)), P_(y-z(x′,y,))). In the exampleof mapping, a coordinate change portion is illustrated before theprojection operation.

Additionally or alternatively, performing 5708 an operation may includemapping the source location from the physical coordinates into athree-dimensional display space. This may be accomplished as describedin connection with one or more of FIGS. 52-53. Additional examples areprovided below. For instance, the electronic device 5602 may render asound source representation corresponding to the source location in athree-dimensional display space. In some configurations, the electronicdevice 5602 may render a plot (e.g., polar plot, rectangular plot) thatincludes the sound source representation on a two-dimensional planecorresponding to physical coordinates in the three-dimensional displayspace, where the plane is rendered based on the device orientation. Inthis way, performing 5708 the operation may include maintaining a sourceorientation in the three-dimensional display space regardless of thedevice orientation (e.g., rotation, tilt, pitch, yaw, roll, etc.). Forinstance, the plot will be aligned with physical coordinates regardlessof how the device is oriented. In other words, the electronic device5602 may compensate for device orientation changes in order to maintainthe orientation of the plot in relation to physical coordinates. In someconfigurations, displaying the three-dimensional display space mayinclude projecting the three-dimensional display space onto atwo-dimensional display (for display on a two-dimensional pixel grid,for example).

FIG. 58 is a block diagram illustrating a more specific configuration ofan electronic device 5802 in which systems and methods for mappingelectronic device 5802 coordinates may be implemented. The electronicdevice 5802 may be an example of the electronic device 5602 described inconnection with FIG. 56. The electronic device 5802 may include at leastone sensor 5804, at least one microphone, a mapper 5810 and an operationblock/module 5814 that may be examples of corresponding elementsdescribed in connection with FIG. 56. In some implementations, the atleast one sensor 5804 may provide sensor data 5808, that may be anexample of the sensor data 5608 described in connection with FIG. 56, tothe mapper 5810.

The operation block/module 5814 may receive a reference orientation5816. In some implementations, the reference orientation 5816 may bestored in memory that is included in and/or coupled to the electronicdevice 5802. The reference orientation 5816 may indicate a referenceelectronic device 5602 orientation. For example, the referenceorientation 5816 may indicate an optimal electronic device 5602orientation (e.g., an optimal holding pattern). The optimal electronicdevice 5602 orientation may correspond to an orientation where a dualmicrophone configuration may be implemented. For example, the referenceorientation 5816 may be the orientation where the electronic device 5602is positioned between a vertical orientation and a horizontalorientation. In some implementations, electronic device 5602 (e.g.,phone) orientations that are horizontal and vertical are non-typicalholding patterns (e.g., not optimal electronic device 5602orientations). These positions (e.g., vertical and/or horizontal) may beidentified using sensors 5804 (e.g., accelerometers). In someimplementations, the intermediate positions (which may include thereference orientation 5816) may be positions for endfire dual microphonenoise suppression. By comparison, the horizontal and/or verticalorientations may be handled by broadside/single microphone noisesuppression.

In some implementations, the operation block/module 5814 may include athree-dimensional source projection block/module 5818, a two-dimensionalsource tracking block/module 5820, a three-dimensional source trackingblock/module 5822, a microphone configuration switch 5824 and/or anon-stationary noise suppression block/module 5826.

The three-dimensional source tracking block/module 5822 may track anaudio signal source in three dimensions. For example, as the audiosignal source moves relative to the electronic device 5602, or as theelectronic device 5602 moves relative to the audio signal source, thethree-dimensional source tracking block/module 5822 may track thelocation of the audio signal source relative to the electronic device5802 in three dimensions. In some implementations, the three-dimensionalsource tracking block/module 5822 may track an audio signal source basedon the mapping 5812. In other words, the three-dimensional sourcetracking block/module 5822 may determine the location of the audiosignal source relative to the electronic device based on the electronicdevice 5802 orientation as indicated in the mapping 5812. In someimplementations, the three-dimensional source projection block/module5818 may project the source (e.g., the source tracked in threedimensions) into two-dimensional space. For example, thethree-dimensional source projection block/module 5818 may use at leastone algorithm to project a source tracked in three dimensions to adisplay in two dimensions.

In this implementation, the two-dimensional source tracking block/module5820 may track the source in two dimensions. For example, as the audiosignal source moves relative to the electronic device 5602, or as theelectronic device 5602 moves relative to the audio signal source, thetwo-dimensional source tracking block/module 5820 may track the locationof the audio signal source relative to the electronic device 5802 in twodimensions. In some implementations, the two-dimensional source trackingblock/module 5820 may track an audio signal source based on the mapping5812. In other words, the two-dimensional source tracking block/module5820 may determine the location of the audio signal source relative tothe electronic device based on the electronic device 5802 orientation asindicated in the mapping 5812.

The microphone configuration switch 5824 may switch the electronicdevice 5802 microphone configuration. For example, the microphoneconfiguration switch 5824 may enable/disable at least one of themicrophones. In some implementations, the microphone configurationswitch 5824 may switch the microphone configuration 306 based on themapping 5812 and/or the reference orientation 5816. For example, whenthe mapping 5812 indicates that the electronic device 5802 is horizontalface-up on a flat surface (e.g., a tabletop mode), the microphoneconfiguration switch 5824 may disable at least one microphone located onthe back of the electronic device 5802. Similarly, when the mapping 5812indicates that the electronic device 5802 orientation is different (by acertain amount for example) from the reference orientation 5816, themicrophone configuration switch 5824 may switch from a multi-microphoneconfiguration (e.g., a dual-microphone configuration) to a singlemicrophone configuration.

Additionally or alternatively, the non-stationary noise-suppressionblock/module 326 may perform non-stationary noise suppression based onthe mapping 5812. In some implementations, the non-stationary noisesuppression block/module 5826 may perform the non-stationary noisesuppression independent of the electronic device 5802 orientation. Forexample, non-stationary noise suppression may include spatial processingsuch as beam-null forming and/or directional masking, which arediscussed above.

FIG. 59 is a flow diagram illustrating a more specific configuration ofa method 5900 for mapping electronic device 5802 coordinates. The method5900 may be performed by the electronic device 5802. The electronicdevice 5802 may obtain 5902 sensor data 5808. In some implementations,this may be done as described in connection with FIG. 57.

The electronic device 5802 may determine 5904 a mapping 5812 ofelectronic device 5802 coordinates from a multi-microphone configurationto physical coordinates based on the sensor data 5808. In someimplementations, this may be done as described in connection with FIG.57.

The electronic device 5802 may determine 5906 an electronic deviceorientation based on the mapping 5812. For example, the mapper 5810 mayreceive sensor data 5808 from a sensor 5804 (e.g., an accelerometer). Inthis example, the mapper 5810 may use the sensor data 5808 to determinethe electronic device 5802 orientation. In some implementations, theelectronic device 5802 orientation may be based on a reference plane.For example, the electronic device 5802 may use polar coordinates todefine an electronic device 5802 orientation. As will be describedbelow, the electronic device 5802 may perform at least one operationbased on the electronic device 5802 orientation.

In some implementations, the electronic device 5802 may provide areal-time source activity map to the user. In this example, theelectronic device 5802 may determine 5906 an electronic device 5802orientation (e.g., a user's holding pattern) by utilizing a sensor 5804(e.g., an accelerometer and/or gyroscope). A variance of likelihood(directionality) may be given by a two-dimensional (2D) anglogram (orpolar plot) per each electronic device 5802 orientation (e.g., holdingpattern). In some cases, the variance may become significantly large(omni-directional) if the electronic device 5802 faces the plane made bytwo pairs orthogonally.

In some implementations, the electronic device 5802 may detect 5908 anychange in the electronic device 5802 orientation based on the mapping5812. For example, the mapper 5810 may monitor the electronic device5802 orientation over time. In this example, the electronic device 5802may detect 5908 any change in the electronic device 5802 orientation.For example, the mapper 5810 may indicate that an electronic device 5802(e.g., a wireless communication device) has changed orientation from ahandset mode (e.g., the side of a user's head) to a browse-talk mode(e.g., in front of a user at eye-level). As will be described below, theelectronic device 5802 may perform at least one operation based on anychange to the electronic device 5802 orientation.

Optionally, the electronic device 5802 (e.g., operation block/module5814) may determine 5910 whether there is a difference between theelectronic device 5802 orientation and the reference orientation 5816.For example, the electronic device 5802 may receive a mapping 5812 thatindicates the electronic device 5802 orientation. The electronic device5802 may also receive a reference orientation 5816. If the electronicdevice 5802 orientation and the reference orientation 5816 are not thesame the electronic device 5802 may determine that there is a differencebetween the electronic device 5802 orientation and the referenceorientation 5816. As will be described below, the electronic device 5802may perform at least one operation based on the difference between theelectronic device 5802 orientation and the reference orientation 5816.In some implementations, determining 5910 whether there is a differencebetween the electronic device 5802 orientation and the referenceorientation 5816 may include determining whether any difference isgreater than a threshold amount. In this example, the electronic device5802 may perform an operation based on the difference when thedifference is greater than the threshold amount.

In some implementations, the electronic device 5802 may switch 5912 amicrophone configuration based on the electronic device 5802orientation. For example, the electronic device 5802 may selectmicrophone signals based on DOA information that maximize the spatialresolution of one or more sound sources in physical coordinates (e.g., a2D physical plane). Switching 5912 a microphone configuration mayinclude enabling/disabling microphones that are located at variouslocations on the electronic device 5802.

Switching 5912 a microphone configuration may be based on the mapping5812 and/or reference orientation 5816. In some configurations,switching 5912 between different microphone configurations may beperformed, but often, as in the case of switching 5912 from a dualmicrophone configuration to a single microphone configuration, mayinclude a certain systematic delay. For example, the systematic delaymay be around three seconds when there is an abrupt change of theelectronic device 5802 orientation. By basing the switch 5912 on themapping 5812 (e.g., and the sensor data 5808), switching 5912 from adual microphone configuration to a single microphone configuration maybe made seamlessly. In some implementations, switching 5912 a microphoneconfiguration based on the mapping 5812 and/or the reference orientation5816 may include switching 5912 a microphone configuration based on atleast one of the electronic device 5802 orientation, any change in theelectronic device 5802 orientation and any difference between theelectronic device 5802 orientation and the reference orientation 5816.

A few examples of switching 5912 a microphone configuration are given asfollows. In one example, the electronic device 5802 may be in thereference orientation 5816 (e.g., an optimal holding pattern). In thisexample, the electronic device 5802 may learn the sensor data 5808(e.g., the accelerometer x-y-z coordinates). This may be based on asimple weighted average (e.g., alpha*history+(1-alpha)*current) or moresophisticated Kalman smoothing, for example. If the electronic device5802 determines 5910 there is a significantly large difference from thetracked accelerometer statistic and the reference orientation 5816, theelectronic device 5802 may switch 5912 from a multiple microphoneconfiguration to a single microphone configuration.

In another example, suppose that a user changes posture (e.g., fromsitting on a chair to lying down on a bed). If the user holds theelectronic device 5802 (e.g., phone) in an acceptable holding pattern(e.g., the electronic device 5802 is in the reference orientation 5816),the electronic device 5802 may continue to be in a multiple microphoneconfiguration, (e.g., a dual microphone configuration), and learn theaccelerometer coordinate (e.g., obtain 5902 the sensor data 5808).Furthermore, the electronic device 5802 may detect a user's posturewhile they are in a phone conversation, for example, by detecting theelectronic device 5802 orientation. Suppose that a user does not speakwhile he/she moves the electronic device 5802 (e.g., phone) away fromthe mouth. In this case, the electronic device 5802 may switch 5912 froma multiple microphone configuration to a single microphone configurationand the electronic device 5802 may remain in the single microphoneconfiguration. However, as soon as the user speaks while holding theelectronic device 5802 in an optimal holding pattern (e.g., in thereference orientation 5816), the electronic device 5802 will switch backto the multiple microphone configuration (e.g., a dual-microphoneconfiguration).

In another example, the electronic device 5802 may be in a horizontalface-down orientation (e.g., a user lies down on a bed holding theelectronic device while the display of the electronic device 5802 isfacing downward towards the top of the bed). This electronic device 5802orientation may be easily detected because the z coordinate is negative,as sensed by the sensor 5804 (e.g., the accelerometer). Additionally oralternatively, for the user's pose change from sitting to lying on abed, the electronic device 5802 may also learn the user's pose usingframes using phase and level differences. As soon as the user uses theelectronic device 5802 in the reference orientation 5816 (e.g., holdsthe electronic device 5802 in the optimal holding pattern), theelectronic device 5802 may perform optimal noise suppression. Sensors5804 (e.g., integrated accelerometer and microphone data) may then beused in the mapper 5810 to determine the electronic device 5802orientation (e.g., holding pattern of the electronic device 5802) andthe electronic device 5802 may perform an operation (e.g., select theappropriate microphone configuration). More specifically, front and backmicrophones may be enabled, or front microphones may be enabled whileback microphones may be disabled. Either of these configurations may bein effect while the electronic device 5802 is in a horizontalorientation (e.g., speakerphone or tabletop mode).

In another example, a user may change the electronic device 5802 (e.g.,phone) holding pattern (e.g., electronic device 5802 orientation) fromhandset usage to speakerphone or vice versa. By addingaccelerometer/proximity sensor data 5808, the electronic device 5802 maymake a microphone configuration switch seamlessly and adjust microphonegain and speaker volume (or earpiece to larger loudspeaker switch). Forexample, suppose that a user puts the electronic device 5802 (e.g.,phone) face down. In some implementations, the electronic device 5802may also track the sensor 5804 so that the electronic device 5802 maytrack if the electronic device 5802 (e.g., phone) is facing down or up.If the electronic device 5802 (e.g., phone) is facing down, theelectronic device 5802 may provide speaker phone functionality. In someimplementations, the electronic device may prioritize the proximitysensor result. In other words, if the sensor data 5808 indicates that anobject (e.g., a hand or a desk) is near to the ear, the electronicdevice may not switch 5912 to speakerphone.

Optionally, the electronic device 5802 may track 5914 a source in threedimensions based on the mapping 5812. For example, the electronic device5802 may track an audio signal source in three dimensions as it movesrelative to the electronic device 5802. In this example, the electronicdevice 5802 may project 5916 the source (e.g., source location) into atwo-dimensional space. For example, the electronic device 5802 mayproject 5916 the source that was tracked in three dimensions onto atwo-dimensional display in the electronic device 5802. Additionally, theelectronic device 5802 may switch 5918 to tracking the source in twodimensions. For example, the electronic device 5802 may track in twodimensions an audio signal source as it moves relative to the electronicdevice 5802. Depending on an electronic device 5802 orientation, theelectronic device 5802 may select corresponding nonlinear pairs ofmicrophones and provide a 360-degree two-dimensional representation withproper two-dimensional projection. For example, the electronic device5802 may provide a visualization of two-dimensional, 360-degree sourceactivity regardless of electronic device 5802 orientation (e.g., holdingpatterns (speakerphone mode, portrait browse-talk mode, and landscapebrowse-talk mode, or in between any combination thereof). The electronicdevice 5802 may interpolate the visualization to a two-dimensionalrepresentation for in-between each holding pattern. In fact, theelectronic device 5802 may even render a three-dimensional visualizationusing three sets of two-dimensional representations.

In some implementations, the electronic device 5802 may perform 5920non-stationary noise suppression. Performing 5920 non-stationary noisesuppression may suppress a noise audio signal from a target audio signalto improve spatial audio processing. In some implementations, theelectronic device 5802 may be moving during the noise suppression. Inthese implementations, the electronic device 5802 may perform 5920non-stationary noise suppression independent of the electronic device5802 orientation. For example, if a user mistakenly rotates a phone butstill wants to focus on some target direction, then it may be beneficialto maintain that target direction regardless of the device orientation.

FIG. 60 is a flow diagram illustrating one configuration of a method6000 for performing 5708 an operation based on the mapping 5812. Themethod 6000 may be performed by the electronic device 5802. Theelectronic device 5802 may detect 6002 any change in the sensor data5808. In some implementations, detecting 6002 any change in the sensordata 5808 may include detecting whether a change in the sensor data 5808is greater than a certain amount. For example, the electronic device5802 may detect 6002 whether there is a change in accelerometer datathat is greater than a determined threshold amount.

The electronic device 5802 may determine 6004 if the sensor data 5808indicates that the electronic device 5802 is in one of a horizontal orvertical position or that the electronic device 5802 is in anintermediate position. For example, the electronic device 5802 maydetermine whether the sensor data 5808 indicates that the electronicdevice 5802 is in a tabletop mode (e.g., horizontal face-up on asurface) or a browse-talk mode (e.g., vertical at eye level) or whetherthe electronic device 5802 is in a position other than vertical orhorizontal (e.g., which may include the reference orientation 5816).

If the electronic device 5802 determines 6004 that the sensor data 5808indicates that the electronic device 5802 is in an intermediateposition, the electronic device 5802 may use 6006 a dual microphoneconfiguration. If the electronic device 5802 was not previously using adual microphone configuration, using 6006 a dual microphoneconfiguration may include switching to a dual microphone configuration.By comparison, if the electronic device 5802 was previously using a dualmicrophone configuration, using 6006 a dual microphone configuration mayinclude maintaining a dual microphone configuration.

If the electronic device 5802 determines 6004 that the sensor data 5808indicates that the electronic device 5802 is in a horizontal or verticalposition, the electronic device 5802 may determine 6008 if a near fieldphase/gain voice activity detector (VAD) is active. In other words, theelectronic device 5802 may determine if the electronic device 5802 islocated close to the audio signal source (e.g., a user's mouth). If theelectronic device 5802 determines 6008 that a near field phase/gainvoice activity detector is active (e.g., the electronic device 5802 isnear the user's mouth), the electronic device 5802 may use 6006 a dualmicrophone configuration.

If the electronic device 5802 determines 6008 that a near fieldphase/gain voice activity detector is not active (e.g., the electronicdevice 5802 is not located close to the audio signal source), theelectronic device 5802 may use 6010 a single microphone configuration.If the electronic device 5802 was not previously using a singlemicrophone configuration, using 6010 a single microphone configurationmay include switching to a single microphone configuration. Bycomparison, if the electronic device 5802 was previously using a singlemicrophone configuration, using 6010 a single microphone configurationmay include maintaining a single microphone configuration. In someimplementations, using 6010 a single microphone configuration mayinclude using broadside/single microphone noise suppression.

FIG. 61 is a flow diagram illustrating another configuration of a method6100 for performing 5708 an operation based on the mapping 5812. Themethod 6100 may be performed by the electronic device 5802. Theelectronic device 5802 may detect 6102 any change in the sensor data5808. In some implementations, this may be done as described inconnection with FIG. 60.

The electronic device 5802 may determine 6104 if the sensor data 5808indicates that the electronic device 5802 is in a tabletop position orin an intermediate or vertical position. For example, the electronicdevice 5802 may determine 6104 if the sensor data 5808 indicates thatthe electronic device 5802 is horizontal face-up on a surface (e.g., atabletop position) or whether the electronic device 5802 is vertical(e.g., a browse-talk position) or in a position other than vertical orhorizontal (e.g., which may include the reference orientation 5816).

If the electronic device 5802 determines 6104 that the sensor data 5808indicates that the electronic device 5802 is in an intermediateposition, the electronic device 5802 may use 6106 front and backmicrophones. In some implementations, using 6106 front and backmicrophones may include enabling/disabling at least one microphone.

If the electronic device 5802 determines 6104 that the sensor data 5808indicates that the electronic device 5802 is in a tabletop position, theelectronic device 5802 may determine 6108 if the electronic device 5802is facing up. In some implementations, the electronic device 5802 maydetermine 6108 if the electronic device 5802 is facing up based on thesensor data 5808. If the electronic device 5802 determines 6108 that theelectronic device 5802 is facing up, the electronic device 5802 may use6110 front microphones. For example, the electronic device may use 6110at least one microphone locate on the front of the electronic device5802. In some implementations, using 6110 front microphones may includeenabling/disabling at least one microphone. For example, using 6110front microphones may include disabling at least one microphone locatedon the back of the electronic device 5802.

If the electronic device 5802 determines 6108 that the electronic device5802 is not facing up (e.g., the electronic device 5802 is facing down),the electronic device 5802 may use 6112 back microphones. For example,the electronic device may use 6112 at least one microphone locate on theback of the electronic device 5802. In some implementations, using 6112back microphones may include enabling/disabling at least one microphone.For example, using 6112 back microphones may include disabling at leastone microphone located on the front of the electronic device 5802.

FIG. 62 is a block diagram illustrating one configuration of a userinterface 6228 in which systems and methods for displaying a userinterface 6228 on an electronic device 6202 may be implemented. In someimplementations, the user interface 6228 may be displayed on anelectronic device 6202 that may be an example of the electronic device5602 described in connection with FIG. 56. The user interface 6228 maybe used in conjunction with and/or independently from themulti-microphone configurations described herein. The user interface6228 may be presented on a display 6264 (e.g., a screen) of theelectronic device 6202. The display 6264 may also present a sectorselection feature 6232. In some implementations, the user interface 6228may provide an editable mode and a fixed mode. In an editable mode, theuser interface 6228 may respond to input to manipulate at least onefeature (e.g., sector selection feature) of the user interface 6228. Ina fixed mode, the user interface 6228 may not respond to input tomanipulate at least one feature of the user interface 6228.

The user interface 6228 may include information. For example, the userinterface 6228 may include a coordinate system 6230. In someimplementations, the coordinate system 6230 may be a reference for audiosignal source location. The coordinate system 6230 may correspond tophysical coordinates. For example, sensor data 5608 (e.g., accelerometerdata, gyro data, compass data, etc.) may be used to map electronicdevice 6202 coordinates to physical coordinates as described in FIG. 57.In some implementations, the coordinate system 6230 may correspond to aphysical space independent of earth coordinates.

The user interface 6228 may display a directionality of audio signals.For example, the user interface 6228 may include audio signal indicatorsthat indicate the direction of the audio signal source. The angle of theaudio signal source may also be indicated in the user interface 6228.The audio signal(s) may be a voice signal. In some implementations, theaudio signals may be captured by the at least one microphone. In thisimplementation, the user interface 6228 may be coupled to the at leastone microphone. The user interface 6228 may display a 2D anglogram ofcaptured audio signals. In some implementations, the user interface 6228may display a 2D plot in 3D perspective to convey an alignment of theplot with a plane that is based on physical coordinates in the realworld, such as the horizontal plane. In this implementation, the userinterface 6228 may display the information independent of the electronicdevice 6202 orientation.

In some implementations, the user interface 6228 may display audiosignal indicators for different types of audio signals. For example, theuser interface 6228 may include an anglogram of a voice signal and anoise signal. In some implementations, the user interface 6228 mayinclude icons corresponding to the audio signals. For example, as willbe described below, the display 6264 may include icons corresponding tothe type of audio signal that is displayed. Similarly, as will bedescribed below, the user interface 6228 may include icons correspondingto the source of the audio signal. The position of these icons in thepolar plot may be smoothed in time. As will be described below, the userinterface 6228 may include one or more elements to carry out thefunctions described herein. For example, the user interface 6228 mayinclude an indicator of a selected sector and/or may display icons forediting a selected sector.

The sector selection feature 6232 may allow selection of at least onesector of the physical coordinate system 6230. The sector selectionfeature 6232 may be implemented by at least one element included in theuser interface 6228. For example, the user interface 6228 may include aselected sector indicator that indicates a selected sector. In someimplementations, the sector selection feature 6232 may operate based ontouch input. For example, the sector selection feature 6232 may allowselection of a sector based on a single touch input (e.g., touching,swiping and/or circling an area of the user interface 6228 correspondingto a sector). In some implementations, the sector selection feature 6232may allow selection of multiple sectors at the same time. In thisexample, the sector selection feature 6232 may allow selection of themultiple sectors based on multiple touch inputs. It should be understoodthat the electronic device 6202 may include circuitry, a processorand/or instructions for producing the user interface 6228.

FIG. 63 is a flow diagram illustrating one configuration of a method6300 for displaying a user interface 6228 on an electronic device 6202.The method 6300 may be performed by the electronic device 6202. Theelectronic device 6202 may obtain 6302 sensor data (e.g., accelerometerdata, tilt sensor data, orientation data, etc.) that corresponds tophysical coordinates.

The electronic device 6202 may present 6304 the user interface 6228, forexample on a display 6264 of the electronic device 6202. In someimplementations, the user interface 6228 may include the coordinatesystem 6230. As described above, the coordinate system 6230 may be areference for audio signal source location. The coordinate system 6230may correspond to physical coordinates. For example, sensor data 5608(e.g., accelerometer data, gyro data, compass data, etc.) may be used tomap electronic device 6202 coordinates to physical coordinates asdescribed above.

In some implementations, presenting 6304 the user interface 6228 thatmay include the coordinate system 6230 may include presenting 6304 theuser interface 6228 and the coordinate system 6230 in an orientationthat is independent of the electronic device 6202 orientation. In otherwords, as the electronic device 6202 orientation changes (e.g., theelectronic device 6202 rotates), the coordinate system 6230 may maintainorientation. In some implementations, the coordinate system 6230 maycorrespond to a physical space independent of earth coordinates.

The electronic device 6202 may provide 6306 a sector selection feature6232 that allows selection of at least one sector of the coordinatesystem 6230. As described above, the electronic device 6202 may provide6306 a sector selection feature via the user interface 6228. Forexample, the user interface 6228 may include at least one element thatallows selection of at least one sector of the coordinate system 6230.For example, the user interface 6228 may include an indicator thatindicates a selected sector.

The electronic device 6202 may also include a touch sensor that allowstouch input selection of the at least one sector. For example, theelectronic device 6202 may select (and/or edit) one or more sectorsand/or one or more audio signal indicators based on one or more touchinputs. Some examples of touch inputs include one or more taps, swipes,patterns (e.g., symbols, shapes, etc.), pinches, spreads, multi-touchrotations, etc. In some configurations, the electronic device 6202(e.g., user interface 6228) may select a displayed audio signalindicator (and/or sector) when one or more taps, a swipe, a pattern,etc., intersects with the displayed audio signal indicator (and/orsector). Additionally or alternatively, the electronic device 6202(e.g., user interface 6228) may select a displayed audio signalindicator (and/or sector) when a pattern (e.g., a circular area,rectangular area or area within a pattern), etc., fully or partiallysurrounds or includes the displayed audio signal indicator (and/orsector). It should be noted that one or more audio signal indicatorsand/or sectors may be selected at a time.

In some configurations, the electronic device 6202 (e.g., user interface6228) may edit one or more sectors and/or audio signal indicators basedon one or more touch inputs. For example, the user interface 6228 maypresent one or more options (e.g., one or more buttons, a drop-downmenu, etc.) that provide options for editing the audio signal indicatoror selected audio signal indicator (e.g., selecting an icon or image forlabeling the audio signal indicator, selecting or changing a color,pattern and/or image for the audio signal indicator, setting whether acorresponding audio signal should be filtered (e.g., blocked or passed),zooming in or out on the displayed audio signal indicator, etc.).Additionally or alternatively, the user interface 6228 may present oneor more options (e.g., one or more buttons, a drop-down menu, etc.) thatprovide options for editing the sector (e.g., selecting or changing acolor, pattern and/or image for the sector, setting whether audiosignals in the sector should be filtered (e.g., blocked or passed),zooming in or out on the sector, adjusting sector size (by expanding orcontracting the sector, for example), etc.). For instance, a pinch touchinput may correspond to reducing or narrowing sector size, while aspread may correspond to enlarging or expanding sector size.

The electronic device 6202 may provide 6308 a sector editing featurethat allows editing the at least one sector. For example, the sectorediting feature may enable adjusting (e.g., enlarging, reducing,shifting, etc.) the sector as describe herein.

In some configurations, the electronic device 6202 (e.g., display 6264)may additionally or alternatively display a target audio signal and aninterfering audio signal on the user interface. The electronic device6202 (e.g., display 6264) may display a directionality of the targetaudio signal and/or the interfering audio signal captured by one or moremicrophones. The target audio signal may include a voice signal.

FIG. 64 is a block diagram illustrating one configuration of a userinterface 6428 in which systems and methods for displaying a userinterface 6428 on an electronic device 6402 may be implemented. In someimplementations, the user interface 6428 may be included on a display6464 of an electronic device 6402 that may be examples of correspondingelements described in connection with FIG. 62. The electronic device6402 may include a user interface 6428, at least one microphone 6406, anoperation block/module 6414, a display 6464 and/or a sector selectionfeature 6432 that may be examples of corresponding elements described inone or more of FIGS. 56 and 62.

In some implementations, the user interface 6428 may present a sectorediting feature 6436, and/or a user interface alignment block/module6440. The sector editing feature 6436 may allow for editing of at leastone sector. For example, the sector editing feature 6436 may allowediting of at least one selected sector of the physical coordinatesystem 6430. The sector editing feature 6436 may be implemented by atleast one element included in the display 6464. For example, the userinterface 6428 may include at least one touch point that allows a userto adjust the size of a selected sector. In some implementations, thesector editing feature 6436 may operate based on touch input. Forexample, the sector editing feature 6436 may allow editing of a selectedsector based on a single touch input. In some implementations, thesector editing feature 6436 may allow for at least one of adjusting thesize of a sector, adjusting the shape of a sector, adjusting theboundaries of a sector and/or zooming in on the sector. In someimplementations, the sector editing feature 6436 may allow editing ofmultiple sectors at the same time. In this example, the sector editingfeature 6436 may allow editing of the multiple sectors based on multipletouch inputs.

As described above, in certain implementations, at least one of thesector selection feature 6432 and the sector editing feature 6436 mayoperate based on a single touch input or multiple touch inputs. Forexample, the sector selection feature 6432 may be based on one or moreswipe inputs. For instance, the one or more swipe inputs may indicate acircular region. In some configurations, the one or more swipe inputsmay be a single swipe. The sector selection feature 6432 may be based onsingle or multi-touch input. Additionally or alternatively, theelectronic device 6402 may adjust a sector based on a single ormulti-touch input.

In these examples, the display 6464 may include a touch sensor 6438 thatmay receive touch input (e.g., a tap, a swipe or circular motion) thatselects a sector. The touch sensor 6438 may also receive touch inputthat edits a sector, for example, by moving touch points displayed onthe display 6464. In some configurations, the touch sensor 6438 may beintegrated with the display 6464. In other configurations, the touchsensor 6438 may be implemented separately in the electronic device 6402or may be coupled to the electronic device 6402.

The user interface alignment block/module 6440 may align all or part ofthe user interface 6428 with a reference plane. In some implementations,the reference plane may be horizontal (e.g., parallel to ground or afloor). For example, the user interface alignment block/module 6440 mayalign part of the user interface 6428 that displays the coordinatesystem 6430. In some implementations, the user interface alignmentblock/module 6440 may align all or part of the user interface 6428 inreal time.

In some configurations, the electronic device 6402 may include at leastone image sensor 6434. For example, several image sensors 6434 may beincluded within an electronic device 6402 (in addition to oralternatively from multiple microphones 6406). The at least one imagesensor 6434 may collect data relating to the electronic device 6402(e.g., image data). For example, a camera (e.g., an image sensor) maygenerate an image. In some implementations, the at least one imagesensor 6434 may provide image data 5608 to the display 6464.

The electronic device 6402 may pass audio signals (e.g., a target audiosignal) included within at least one sector. For example, the electronicdevice 6402 may pass audio signals an operation block/module 6414. Theoperation block/module may pass audio one or more signals indicatedwithin the at least one sector. In some implementations, the operationblock/module 6414 may include an attenuator 6442 that attenuates anaudio signal. For example, the operation block/module 6414 (e.g.,attenuator 6442) may attenuate (e.g., block, reduce and/or reject) audiosignals not included within the at least one selected sector (e.g.,interfering audio signal(s)). In some cases, the audio signals mayinclude a voice signal. For instance, the sector selection feature mayallow attenuation of undesirable audio signals aside from a user voicesignal.

In some configurations, the electronic device (e.g., the display 6464and/or operation block/module 6414) may indicate image data from theimage sensor(s) 6434. In one configuration, the electronic device 6402(e.g., operation block/module 6414) may pass image data (and filterother image data, for instance) from the at least one image sensor 6434based on the at least one sector. In other words, at least one of thetechniques described herein regarding the user interface 6428 may beapplied to image data alternatively from or in addition to audiosignals.

FIG. 65 is a flow diagram illustrating a more specific configuration ofa method 6500 for displaying a user interface 6428 on an electronicdevice 6402. The method may be performed by the electronic device 6402.The electronic device 6402 may obtain 6502 a coordinate system 6430 thatcorresponds to a physical coordinate. In some implementations, this maybe done as described in connection with FIG. 63.

The electronic device 6402 may present 6504 a user interface 6428 thatincludes the coordinate system 6430. In some implementations, this maybe done as described in connection with FIG. 63.

The electronic device 6402 may display 6506 a directionality of at leastone audio signal captured by at least one microphone. In other words,the electronic device 6402 may display the location of an audio signalsource relative to the electronic device. The electronic device 6402 mayalso display the angle of the audio signal source in the display 6464.As described above, the electronic device 6402 may display a 2Danglogram of captured audio signals. In some implementations, thedisplay 6464 may display a 2D plot in 3D perspective to convey analignment of the plot with a plane that is based on physical coordinatesin the real world, such as the horizontal plane.

The electronic device 6402 may display 6508 an icon corresponding to theat least one audio signal (e.g., corresponding to a wave patterndisplayed on the user interface 6428). According to some configurations,the electronic device 6402 (e.g., display 6464) may display 6508 an iconthat identifies an audio signal as being a target audio signal (e.g.,voice signal). Additionally or alternatively, the electronic device 6402(e.g., display 6464) may display 6508 an icon (e.g., a different icon)that identifies an audio signal as being noise and/or interference(e.g., an interfering or interference audio signal).

In some implementations, the electronic device 6402 may display 6508 anicon that corresponds to the source of an audio signal. For example, theelectronic device 6402 may display 6508 an image icon indicating thesource of a voice signal, for example, an image of an individual. Theelectronic device 6402 may display 6508 multiple icons corresponding tothe at least one audio signal. For example, the electronic device maydisplay at least one image icon and/or icons that identify the audiosignal as a noise/interference signal or a voice signal.

The electronic device 6402 (e.g., user interface 6428) may align 6510all of part of the user interface 6428 with a reference plane. Forexample, the electronic device 6402 may align 6510 the coordinate system6430 with a reference plane. In some configurations, aligning 6510 allor part of the user interface 6428 may include mapping (e.g.,projecting) a two-dimensional plot (e.g., polar plot) into athree-dimensional display space. Additionally or alternatively, theelectronic device 6402 may align one or more of the sector selectionfeature 6432 and the sector editing feature 6436 with a reference plane.The reference plane may be horizontal (e.g., correspond to earthcoordinates). In some implementations, the part of the user interface6428 that is aligned with the reference plane may be aligned with thereference plane independent of the electronic device 6402 orientation.In other words, as the electronic device 6402 translates and/or rotates,all or part of the user interface 6428 that is aligned with thereference plane may remain aligned with the reference plane. In someimplementations, the electronic device 6402 may align 6510 all or partof the user interface 6428 in real-time.

The electronic device 6402 may provide 6512 a sector selection feature6432 that allows selection of at least one sector of the coordinatesystem 6430. In some implementations, this may be done as described inconnection with FIG. 63.

In some implementations, the electronic device 6402 (e.g., userinterface 6428 and/or sector selection feature 6432) may pad 6514 aselected sector. For example, the electronic device 6402 may includeadditional information with the audio signal to improve spatial audioprocessing. For example, padding may refer to providing visual feedbackprovided as highlighted (e.g., bright color) padding for the selectedsector. For example, the selected sector 7150 (e.g., the outline of thesector) illustrated in FIG. 71 may be highlighted to enable easyidentification of the selected sector.

The electronic device 6402 (e.g., the display 6464, the user interface6428, etc.) may provide 6516 a sector editing feature 6436 that allowsediting at least one sector. As described above, the electronic device6402 may provide 6516 a sector editing feature 6436 via the userinterface 6428. In some implementations, the sector editing feature 6436may operate based on touch input. For example, the sector editingfeature 6436 may allow editing of a selected sector based on a single ormultiple touch inputs. For instance, the user interface 6428 may includeat least one touch point that allows a user to adjust the size of aselected sector. In this implementation, the electronic device 6402 mayprovide a touch sensor 6438 that receives touch input that allowsediting of the at least one sector.

The electronic device 6402 may provide 6518 a fixed mode and an editablemode. In an editable mode, the user interface 6428 may respond to inputto manipulate at least one feature (e.g., sector selection feature 6432)of the user interface 6428. In a fixed mode, the user interface 6428 maynot respond to input to manipulate at least one feature of the userinterface 6428. In some implementations, the electronic device 6402 mayallow selection between a fixed mode and an editable mode. For example,a radio button of the user interface 6428 may allow for selectionbetween an editable mode and a fixed mode.

The electronic device 6402 may pass 6520 audio signals indicated withinat least one sector. For example, the electronic device 6402 may pass6520 audio signals indicated in a selected sector. In someimplementations, the electronic device 6402 may attenuate 6522 an audiosignal. For example, the electronic device 6402 may attenuate 6522(e.g., reduce and/or reject) audio signals not included within the atleast one selected sectors. For example, the audio signals may include avoice signal. In this example, the electronic device 6402 may attenuate6522 undesirable audio signals aside from a user voice signal.

FIG. 66 illustrates examples of the user interface 6628 a-b fordisplaying a directionality of at least one audio signal. In someimplementations, the user interfaces 6628 a-b may be examples of theuser interface 6228 described in connection with FIG. 62. The userinterfaces 6628 a-b may include coordinate systems 6630 a-b that may beexamples of the coordinate system 6230 described in connection with FIG.62.

In FIG. 66, an electronic device 6202 (e.g., phone) may be lying flat.This may occur, for example, in a tabletop mode. In FIG. 66, thecoordinate systems 6630 a-b may include at least one audio signalindicator 6646 a-b that may indicate the directionality of at least oneaudio signal (according to an angle or range of angles, for instance).The at least one audio signal may originate from a person, a speaker, oranything that can create an audio signal. In a first user interface 6628a, a first audio signal indicator 6646 a may indicate that a first audiosignal is at roughly 180 degrees. By comparison, in a second userinterface 6628 b, a second audio signal indicator 6646 b may indicatethat a second audio signal is at roughly 270 degrees. In someimplementations, the audio signal indicators 6646 a-b may indicate thestrength of the audio signal. For example, the audio signal indicators6646 a-b may include a gradient of at least one color that indicates thestrength of an audio signal.

The first user interface 6628 a provides examples of one or morecharacteristics that may be included in one or more of the userinterfaces described herein. For example, the first user interface 6628a includes a title portion 6601. The title portion 6601 may include atitle of the user interface or application that provides the userinterface. In the example illustrated in FIG. 66, the title is “SFAST.”Other titles may be utilized. In general, the title portion 6601 isoptional: some configurations of the user interface may not include atitle portion. Furthermore, it should be noted that the title portionmay be located anywhere on the user interface (e.g., top, bottom,center, left, right and/or overlaid, etc.).

In the example illustrated in FIG. 66, the first user interface 6628 aincludes a control portion 6603. The control portion 6603 includesexamples of interactive controls. In some configurations, one or more ofthese interactive controls may be included in a user interface describedherein. In general, the control portion 6603 may be optional: someconfigurations of the user interface may not include a control portion6603. Furthermore, the control portion may or may not be grouped asillustrated in FIG. 66. For example, one or more of the interactivecontrols may be located in different sections of the user interface(e.g., top, bottom, center, left, right and/or overlaid, etc.).

In the example illustrated in FIG. 66, the first user interface 6628 aincludes an activation/deactivation button 6607, check boxes 6609, atarget sector indicator 6611, radio buttons 6613, a smoothing slider6615, a reset button 6617 and a noise suppression (NS) enable button6619. It should be noted, however, that the interactive controls may beimplemented in a wide variety of configurations. For example, one ormore of slider(s), radio button(s), button(s), toggle button(s), checkbox(es), list(s), dial(s), tab(s), text box(es), drop-down list(s),link(s), image(s), grid(s), table(s), label(s), etc., and/orcombinations thereof may be implemented in the user interface to controlvarious functions.

The activation/deactivation button 6607 may generally activate ordeactivate functionality related to the first user interface 6628 a. Forexample, when an event (e.g., touch event) corresponding to theactivation/deactivation button 6607 occurs, the user interface 6628 amay enable user interface interactivity and display an audio signalindicator 6646 a in the case of activation or may disable user interfaceinteractivity and pause or discontinue displaying the audio signalindicator 6646 a in the case of deactivation.

The check boxes 6609 may enable or disable display of a target audiosignal and/or an interferer audio signal. For example, the showinterferer and show target check boxes enable visual feedback on thedetected angle of the detected/computed interferer and target audiosignal(s), respectively. For example, the “show interferer” element maybe a pair with the “show target” element, which enable visualizingpoints for target and interference locations in the user interface 6628a. In some configurations, the “show interferer” and “show target”elements may enable/disable display of some actual picture of a targetsource or interferer source (e.g., their actual face, an icon, etc.) onthe angle location detected by the device.

The target sector indicator 6611 may provide an indication of a selectedor target sector. In this example, all sectors are indicated as thetarget sector. Another example is provided in connection with FIG. 71below.

The radio buttons 6613 may enable selection of a fixed or editablesector mode. In the fixed mode, one or more sectors (e.g., selectedsectors) may not be adjusted. In the editable mode, one or more sectors(e.g., selected sectors) may be adjusted.

The smoothing slider 6615 may provide selection of a value used tofilter the input. For example, a value of 0 indicates that there is nofilter, whereas a value of 25 may indicate aggressive filtering. In someconfigurations, the smoothing slider 6615 stands for an amount ofsmoothing for displaying the source activity polar plot. For instance,the amount of smoothing may be based on the value indicated by thesmoothing slider 6615, where recursive smoothing is performed (e.g.,polar=(1−alpha)*polar+(alpha)*polar_current_frame, so less alpha meansmore smoothing).

The reset button 6617 may enable clearing of one or more current userinterface 6628 a settings. For example, when a touch event correspondingto the reset button 6617 occurs, the user interface 6628 a may clear anysector selections, may clear whether the target and/or interferer audiosignals are displayed and/or may reset the smoothing slider to a defaultvalue. The noise suppression (NS) enable button 6619 may enable ordisable noise suppression processing on the input audio signal(s). Forexample, an electronic device may enable or disable filteringinterfering audio signal(s) based on the noise suppression (NS) enablebutton 6619.

The user interface 6628 a may include a coordinate system portion 6605(e.g., a plot portion). In some configurations, the coordinate systemportion 6605 may occupy the entire user interface 6628 a (and/or anentire device display). In other configurations, the coordinate systemmay occupy a subsection of the user interface 6628 a. Although polarcoordinate systems as given as examples herein, it should be noted thatalternative coordinate systems, such as rectangular coordinate systems,may be included in the user interface 6628 a.

FIG. 94 illustrates another example of a user interface 9428. In thisexample, the user interface 9428 includes a rectangular (e.g.,Cartesian) coordinate system 9430. One example of an audio signalindicator 9446 is also shown. As described above, the coordinate system9430 may occupy the entire user interface 9428 (and/or an entire display9464 included in an electronic device 6202) as illustrated in FIG. 94.In other configurations, the coordinate system 9430 may occupy asubsection of the user interface 9428 (and/or display 9464). It shouldbe noted that a rectangular coordinate system may be implementedalternatively from any of the polar coordinate systems described herein.

FIG. 67 illustrates another example of the user interface 6728 fordisplaying a directionality of at least one audio signal. In someimplementations, the user interface 6728 may be an example of the userinterface 6228 described in connection with FIG. 62. The user interfacemay include a coordinate system 6730 and at least one audio signalindicator 6746 a-b that may be examples of corresponding elementsdescribed in connection with one or more of FIGS. 62 and 66. In FIG. 67,the user interface 6728 may include multiple audio signal indicators6746 a-b. For example, a first audio signal indicator 6746 a mayindicate that a first audio signal source 6715 a is at approximately 90degrees and a second audio signal source 6715 b is at approximately 270degrees. For example, FIG. 67 illustrates one example of voice detectionto the left and right of an electronic device that includes the userinterface 6728. More specifically, the user interface 6728 may indicatevoices detected from the left and right of an electronic device. Forinstance, the user interface 6728 may display multiple (e.g., two)different sources at the same time in different locations. In someconfigurations, the procedures described in connection with FIG. 78below may enable selecting two sectors corresponding to the audio signalindicators 6746 a-b (and to the audio signal sources 6715 a-b, forexample).

FIG. 68 illustrates another example of the user interface 6828 fordisplaying a directionality of at least one audio signal. In someimplementations, the user interface 6828 may be an example of the userinterface 6228 described in connection with FIG. 62. The user interfacemay include a coordinate system 6830, and an audio signal indicator 6846that may be examples of corresponding elements described in connectionwith one or more of FIGS. 62 and 66. FIG. 68 illustrates one example ofa two-dimensional coordinate system 6830 being projected intothree-dimensional display space, where the coordinate system 6830appears to extend inward into the user interface 6828. For instance, anelectronic device 6202 (e.g., phone) may be in the palm of a user'shand. In particular, the electronic device 6202 may be in a horizontalface-up orientation. In this example, a part of the user interface 6828may be aligned with a horizontal reference plane as described earlier.The audio signal in FIG. 68 may originate from a user that is holdingthe electronic device 6202 in their hands and speaking in front of it(at roughly 180 degrees, for instance).

FIG. 69 illustrates another example of the user interface 6928 fordisplaying a directionality of at least one audio signal. In someimplementations, the user interface 6928 may be an example of the userinterface 6228 described in connection with FIG. 62. The user interfacemay include a coordinate system 6930 and an audio signal indicator 6946that may be examples of corresponding elements described in connectionwith one or more of FIGS. 62 and 66. In FIG. 69, the electronic device6202 (e.g., phone) may be in the palm of a user's hand. For example, theelectronic device 6202 may be in a horizontal face-up orientation. Inthis example, a part of the user interface 6928 may be aligned with ahorizontal reference plane as described earlier. The audio signal inFIG. 69 may originate from behind the electronic device 6202 (at roughly0 degrees, for instance).

FIG. 70 illustrates another example of the user interface 7028 fordisplaying a directionality of at least one audio signal. In someimplementations, the user interface 7028 may be an example of the userinterface 6228 described in connection with FIG. 62. The user interfacemay include a coordinate system 7030 and at least one audio signalindicator 7046 a-b that may be examples of corresponding elementsdescribed in connection with one or more of FIG. 62 and FIG. 66. In someconfigurations, the user interface 7028 may include at least one icon7048 a-b corresponding to the type of audio signal indicator 7046 a-bthat is displayed. For example, the user interface 7028 may display atriangle icon 7048 a next to a first audio signal indicator 7046 a thatcorresponds to a target audio signal (e.g., a speaker's or user'svoice). Similarly, the user interface 7028 may display a diamond icon7048 b next to a second audio signal indicator 7046 b that correspondsto interference (e.g., an interfering audio signal or noise).

FIG. 71 illustrates an example of the sector selection feature 6232 ofthe user interface 7128. In some implementations, the user interface7128 may be an example of the user interface 6228 described inconnection with FIG. 62. The user interface 7128 may include acoordinate system 7130 and/or an audio signal indicator 7146 that may beexamples of corresponding elements described in connection with one ormore of FIGS. 62 and 66. As described above, the user interface 7128 mayinclude a sector selection feature 6232 that allows selection of atleast one sector, by touch input for example. In FIG. 71, a selectedsector 7150 is indicated by the dashed line. In some implementations,the angle range of a selected sector 7150 may also be displayed (e.g.,approximately 225 degrees to approximately 315 degrees as shown in FIG.71). As described earlier, in some implementations, the electronicdevice 6202 may pass the audio signal (e.g., represented by the audiosignal indicator 7146) indicated within the selected sector 7150. Inthis example, the audio signal source is to the side of the phone (atapproximately 270 degrees). In some configurations, the other sector(s)outside of the selected sector 7150 may be noise suppressed and/orattenuated.

In the example illustrated in FIG. 71, the user interface 7128 includesa target sector indicator. The target sector indicator indicates aselected sector between 225 and 315 degrees in this case. It should benoted that sectors may be indicated with other parameters in otherconfigurations. For instance, the target sector indicator may indicate aselected sector in radians, according to a sector number, etc.

FIG. 72 illustrates another example of the sector selection feature 6232of the user interface 7228. In some implementations, the user interface7228 may be an example of the user interface 6228 described inconnection with FIG. 62. The user interface 7228 may include acoordinate system 7230, an audio signal indicator 7246 and at least oneselected sector 7250 a-b that may be examples of corresponding elementsdescribed in connection with at least one of FIGS. 62, 66 and 71. Asdescribed above, the sector selection feature 6232 may allow selectionof multiple sectors at the same time. In FIG. 72, two sectors 7250 a-bhave been selected (as indicated by the dashed lines, for instance). Inthis example, the audio signal is at roughly 270 degrees. The othersectors(s) outside of the selected sectors 7250 a-b may be noisesuppressed and/or attenuated. Thus, the systems and methods disclosedherein may enable the selection of two or more sectors 7250 at once.

FIG. 73 illustrates another example of the sector selection feature 6232of the user interface 7328. In some implementations, the user interface7328 may be an example of the user interface 6228 described inconnection with FIG. 62. The user interface 7328 may include acoordinate system 7330, at least one audio signal indicator 7346 a-b andat least one selected sector 7350 a-b that may be examples ofcorresponding elements described in connection with at least one ofFIGS. 62, 66 and 71. In FIG. 73, two sectors 7350 a-b have been selected(as indicated by the dashed lines, for instance). In this example, thespeaker is to the side of the electronic device 6202. The othersectors(s) outside of the selected sectors 7250 a-b may be noisesuppressed and/or attenuated.

FIG. 74 illustrates more examples of the sector selection feature 6232of the user interfaces 7428 a-f. In some implementations, the userinterfaces 7428 a-f may be examples of the user interface 6228 describedin connection with FIG. 62. The user interfaces 7428 a-f may includecoordinate systems 7430 a-f, at least one audio signal indicator 7446a-f and at least one selected sector 7450 a-c that may be examples ofcorresponding elements described in connection with at least one ofFIGS. 62, 66 and 71. In this example, the selected sector(s) 7450 a-cmay be determined based on the touch input 7452. For instance, thesectors and/or sector angles may be selected based upon finger swipes.For example, a user may input a circular touch input 7452. A selectedsector 7150 b may then be determined based on the circle touch input7452. In other words, a user may narrow a sector by drawing the regionof interest instead of manually adjusting (based on touch points or“handles,” for instance). In some implementations, if multiple sectorsare selected based on the touch input 7452, then the “best” sector 7450c may be selected and readjusted to match the region of interest. Insome implementations, the term “best” may indicate a sector with thestrongest at least one audio signal. This may be one user-friendly wayto select and narrow sector(s). It should be noted that for magnifyingor shrinking a sector, multiple fingers (e.g., two or more) can be usedat the same time on or above the screen. Other examples of touch input7452 may include a tap input from a user. In this example, a user maytap a portion of the coordinate system and a sector may be selected thatis centered on the tap location (or aligned to a pre-set degree range).In this example, a user may then edit the sector by switching toeditable mode and adjusting the touch points, as will be describedbelow.

FIG. 75 illustrates more examples of the sector selection feature 6232of the user interfaces 7528 a-f. In some implementations, the userinterfaces 7528 a-f may be examples of the user interface 6228 describedin connection with FIG. 62. The user interfaces 7528 a-f may includecoordinate systems 7530 a-f, at least one audio signal indicator 7546a-f and at least one selected sector 7550 a-c that may be examples ofcorresponding elements described in connection with at least one ofFIGS. 62, 66 and 71. In this example, the selected sector(s) 7550 a-cmay be determined based on the touch input 7552. For instance, thesectors and/or sector angles may be selected based upon finger swipes.For example, a user may input a swipe touch input 7552. In other words,a user may narrow a sector by drawing the region of interest instead ofmanually adjusting (based on touch points or “handles,” for instance).In this example, sector(s) may be selected and/or adjusted based on justa swipe touch input 7552 (instead of a circular drawing, for instance).A selected sector 7150 b may then be determined based on the swipe touchinput 7552. In some implementations, if multiple sectors are selectedbased on the touch input 7552, then the “best” sector 7550 c may beselected and readjusted to match the region of interest. In someimplementations, the term “best” may indicate a sector with thestrongest at least one audio signal. This may be one user-friendly wayto select and narrow sector(s). It should be noted that for magnifyingor shrinking a sector, multiple fingers (e.g., two or more) can be usedat the same time on or above the screen. It should be noted that asingle finger or multiple fingers may be sensed in accordance with anyof the sector selection and/or adjustment techniques described herein.

FIG. 76 is a flow diagram illustrating one configuration of a method7600 for editing a sector. The method 7600 may be performed by theelectronic device 6202. The electronic device 6202 (e.g., display 6264)may display 7602 at least one point (e.g., touch point) corresponding toat least one sector. In some implementations, the at least one touchpoint may be implemented by the sector editing feature 6436 to allowediting of at least one sector. For example, the user interface 6228 mayinclude at least one touch point that allows a user to adjust the size(e.g., expand or narrow) of a selected sector. The touch points may bedisplayed around the borders of the sectors.

The electronic device 6202 (e.g., a touch sensor) may receive 7604 atouch input corresponding to the at least one point (e.g., touch point).For example, the electronic device 6202 may receive a touch input thatedits a sector (e.g., adjusts its size and/or shape). For instance, auser may select at least one touch point by touching them. In thisexample, a user may move touch points displayed on the user interface6228. In this implementation, receiving 7604 a touch input may includeadjusting the touch points based on the touch input. For example, as auser moves the touch points via the touch sensor 6438, the electronicdevice 6202 may move the touch points accordingly.

The electronic device 6202 (e.g., user interface 6228) may edit 7606 theat least one sector based on the touch input. For example, theelectronic device 6202 may adjust the size and/or shape of the sectorbased on the single or multi-touch input. Similarly, the electronicdevice 6202 may change the position of the sector relative to thecoordinate system 6230 based on the touch input.

FIG. 77 illustrates examples of a sector editing feature 6436 of theuser interfaces 7728 a-b. In some implementations, the user interfaces7728 a-b may be examples of the user interface 6228 described inconnection with FIG. 62. The user interfaces 7728 a-b may includecoordinate systems 7730 a-b that may be examples of correspondingelements described in connection with FIG. 62. The user interfaces 7728a-b may include at least one touch point 7754 a-h. As described above,the touch points 7754 a-h may be handles that allow editing of at leastone sector. The touch points 7754 a-h may be positioned at the apexes ofthe sectors. In some implementations, sector editing may be doneindependent of sector selection. Accordingly, a sector that is notselected may be adjusted in some configurations.

In some implementations, the user interfaces 7728 a-b may provide aninteractive control that enables a fixing mode and an editing mode ofthe user interfaces 7728 a-b. For example, the user interfaces 7728 a-bmay each include an activation/deactivation button 7756 a-b thatcontrols whether the user interface 7728 a-b is operable. Theactivation/deactivation buttons 7756 a-b may toggleactivated/deactivated states for the user interfaces 7728 a-b. While inan editable mode, the user interfaces 7728 a-b may display at least onetouch point 7754 a-f (e.g., handles) corresponding to at least onesector (e.g., the circles at the edges of the sectors).

FIG. 78 illustrates more examples of the sector editing feature 6436 ofthe user interface 7828 a-c. In some implementations, the userinterfaces 7828 a-c may be examples of the user interface 6228 describedin connection with FIG. 62. The user interfaces 7828 a-c may includecoordinate systems 7830 a-c, at least one audio signal indicator 7846a-b, at least one selected sector 7850 a-e and at least one touch point7854 a-l that may be examples of corresponding elements described inconnection with at least one of FIGS. 62, 66 and 71. In FIG. 78, atleast one sector has been selected (as illustrated by the dashed lines,for instance). As depicted in FIG. 78, the selected sectors 7850 a-e maybe narrowed for more precision. For example, a user may use the touchpoints 7854 a-l to adjust (e.g., expand and narrow) the selected sector7850 a-e. The other sectors(s) outside of the selected sectors 7850 a-emay be noise suppressed and/or attenuated.

FIG. 79 illustrates more examples of the sector editing feature 6436 ofthe user interfaces 7928 a-b. In some implementations, the userinterfaces 7928 a-b may be examples of the user interface 6228 describedin connection with FIG. 62. The user interfaces 7928 a-b may includecoordinate systems 7930 a-b, at least one audio signal indicator 7946a-b, at least one selected sector 7950 a-b and at least one touch point7954 a-h that may be examples of corresponding elements described inconnection with at least one of FIGS. 62, 66 and 71. In FIG. 79, theelectronic device 6202 (e.g., phone) may be in the palm of a user'shand. For example, the electronic device 6202 may be tilted upward. Inthis example, a part of the user interfaces 7928 a-b (e.g., thecoordinate systems 7930 a-b) may be aligned with a horizontal referenceplane as described earlier. Accordingly, the coordinate systems 7930 a-bappear in a three-dimensional perspective extending into the userinterfaces 7928 a-b. The audio signal in FIG. 79 may originate from auser that is holding the electronic device 6202 in their hands andspeaking in front of it (at roughly 180 degrees, for instance). FIG. 79also illustrates that at least one sector can be narrowed or widened inreal-time. For instance, a selected sector 7950 a-b may be adjustedduring an ongoing conversation or phone call.

FIG. 80 illustrates more examples of the sector editing feature 6436 ofthe user interfaces 8028 a-c. In some implementations, the userinterfaces 8028 a-c may be examples of the user interface 6228 describedin connection with FIG. 62. The user interfaces 8028 a-c may includecoordinate systems 8030 a-c, at least one audio signal indicator 8046a-c, at least one selected sector 8050 a-b and at least one touch point8054 a-b that may be examples of corresponding elements described inconnection with at least one of FIGS. 62, 66 and 71. The firstillustration depicts an audio signal indicator 8046 a indicating thepresence of an audio signal at approximately 270 degrees. The middleillustration shows a user interface 8028 b with a selected sector 8050a. The right illustration depicts one example of editing the selectedsector 8050 b. In this case, the selected sector 8050 b is narrowed. Inthis example, an electronic device 6202 may pass the audio signals thathave a direction of arrival associated with the selected sector 8050 band attenuate other audio signals that have a direction of arrivalassociated with the outside of the selected sector 8050 b.

FIG. 81 illustrates more examples of the sector editing feature 6436 ofthe user interfaces 8128 a-d. In some implementations, the userinterfaces 8128 a-d may be examples of the user interface 6228 describedin connection with FIG. 62. The user interfaces 8128 a-d may includecoordinate systems 8130 a-d, at least one audio signal indicator 8146a-d, at least one selected sector 8150 a-c and at least one touch point8154 a-h that may be examples of corresponding elements described inconnection with at least one of FIGS. 62, 66 and 71. The firstillustration depicts an audio signal indicator 8146 a indicating thepresence of an audio signal at approximately 270 degrees. The secondillustration shows a user interface 8128 b with a selected sector 8150a. The third illustration shows at least one touch point 8154 a-d usedfor editing a sector. The fourth illustration depicts one example ofediting the selected sector 8150 d. In this case, the selected sector8150 d is narrowed. In this example, an electronic device 6202 may passthe audio signals that have a direction of arrival associated with theselected sector 8150 d (e.g., that may be based on user input) andattenuate other audio signals that have a direction of arrivalassociated with the outside of the selected sector 8150 d.

FIG. 82 illustrates an example of the user interface 8228 with acoordinate system 8230 oriented independent of electronic device 6202orientation. In some implementations, the user interface 8228 may be anexample of the user interface 6228 described in connection with FIG. 62.The user interface includes a coordinate system 8230, and an audiosignal indicator 8246 that may be examples of corresponding elementsdescribed in connection with at least one of FIGS. 62 and 66. In FIG.82, the electronic device 6202 (e.g., phone) is tilted upward (in thepalm of a user's hand, for example). The coordinate system 8230 (e.g.,the polar graph) of the user interface 8228 shows or displays the audiosignal source location. In this example, a part of the user interface8228 is aligned with a horizontal reference plane as described earlier.The audio signal in FIG. 82 originates from a source 8215 at roughly 180degrees. As described above, a source 8215 may include a user (that isholding the electronic device 6202 in their hand and speaking in frontof it, for example), a speaker, or anything that is capable ofgenerating an audio signal.

FIG. 83 illustrates another example of the user interface 8328 with acoordinate system 8330 oriented independent of electronic device 6202orientation. In some implementations, the user interface 8328 may be anexample of the user interface 6228 described in connection with FIG. 62.The user interface 8328 includes a coordinate system 8330 and an audiosignal indicator 8346 that may be examples of corresponding elementsdescribed in connection with at least one of FIGS. 62 and 66. In FIG.83, the electronic device 6202 (e.g., phone) is in a slanted or tiltedorientation (in the palm of a user's hand, for example) increasing inelevation from the bottom of the electronic device 6202 to the top ofthe electronic device 6202 (towards the sound source 8315). Thecoordinate system 8330 (e.g., the polar graph) of the user interface8328 displays the audio signal source location. In this example, a partof the user interface 8328 is aligned with a horizontal reference planeas described earlier. The audio signal in FIG. 83 originates from asource 8315 that is toward the back of (or behind) the electronic device6202 (e.g., the phone). FIG. 83 illustrates that the reference plane ofthe user interface 8328 is aligned with the physical plane (e.g.,horizontal) of the 3D world. Note that in FIG. 83, the user interface8328 plane goes into the screen, even though the electronic device 6202is being held semi-vertically. Thus, even though the electronic device6202 is at approximately 45 degrees relative to the physical plane ofthe floor, the user interface 8328 coordinate system 8330 plane is at 0degrees relative to the physical plane of the floor. For example, thereference plane on the user interface 8328 corresponds to the referenceplane in the physical coordinate system.

FIG. 84 illustrates another example of the user interface 8428 with acoordinate system 8430 oriented independent of electronic device 6202orientation. In some implementations, the user interface 8428 may be anexample of the user interface 6228 described in connection with FIG. 62.The user interface 8428 includes a coordinate system 8430 and an audiosignal indicator 8446 that may be examples of corresponding elementsdescribed in connection with at least one of FIGS. 62 and 66. In FIG.84, the electronic device 6202 (e.g., phone) is in a verticalorientation (in the palm of a user's hand, for example). The coordinatesystem 8430 (e.g., the polar graph) of the user interface 8428 displaysthe audio signal source location. In this example, a part of the userinterface 8428 is aligned with a horizontal reference plane as describedearlier. The audio signal in FIG. 84 originates from a source 8415 thatis toward the back left of (e.g., behind) the electronic device 6202(e.g., the phone).

FIG. 85 illustrates another example of the user interface 8528 with acoordinate system 8530 oriented independent of electronic device 6202orientation. In some implementations, the user interface 8528 may be anexample of the user interface 6228 described in connection with FIG. 62.The user interface 8528 includes a coordinate system 8530 and an audiosignal indicator 8546 that may be examples of corresponding elementsdescribed in connection with at least one of FIGS. 62 and 66. In FIG.85, the electronic device 6202 (e.g., phone) is in a horizontal face-uporientation (e.g., a tabletop mode). The coordinate system 8530 (e.g.,the polar graph) of the user interface 8528 displays the audio signalsource location. The audio signal in FIG. 85 may originate from a source8515 that is toward the top left of the electronic device 6202 (e.g.,the phone). In some examples, the audio signal source is tracked. Forexample, when noise suppression is enabled, the electronic device 6202may track the loudest speaker or sound source. For instance, theelectronic device 6202 (e.g., phone) may track the movements of aloudest speaker while suppressing other sounds (e.g., noise) from otherareas (e.g., zones or sectors).

FIG. 86 illustrates more examples of the user interfaces 8628 a-c with acoordinate systems 8630 a-c oriented independent of electronic device6202 orientation. In other words, the coordinate systems 8630 a-c and/orthe audio signal indicators 8646 a-c remain at the same orientationrelative to physical space, independent of how the electronic device6202 is rotated. In some implementations, the user interfaces 8628 a-cmay be examples of the user interface 6228 described in connection withFIG. 62. The user interfaces 8628 a-c may include coordinate systems8630 a-c and audio signal indictors 8646 a-c that may be examples ofcorresponding elements described in connection with at least one ofFIGS. 62 and 66. Without a compass, the sector selection feature 6232may not have an association with the physical coordinate system of thereal world (e.g., north, south, east, west, etc.). Accordingly, if theelectronic device 6202 (e.g., phone) is in a vertical orientation facingthe user (e.g., a browse-talk mode), the top of the electronic device6202 may be designated as “0 degrees” and runs along a vertical axis.When the electronic device 6202 is rotated, for example by 90 degrees ina clockwise direction, “0 degrees” is now located on a horizontal axis.Thus, when a sector is selected, rotation of the electronic device 6202affects the selected sector. By adding another component that can detectdirection, for example, a compass, the sector selection feature 6232 ofthe user interface 8628 a-c can be relative to physical space, and notthe phone. In other words, by adding a compass, when the phone isselected from a vertically upright position to a horizontal position, “0degrees” still remains on the top side of the phone that is facing theuser. For example, in the first image of FIG. 86, the user interface8628 a is illustrated without tilt (or with 0 degrees tilt, forinstance). For example, the coordinate system 8630 a is aligned with theuser interface 8628 a and/or the electronic device 6202. By comparison,in the second image of FIG. 86, the user interface 8628 b and/orelectronic device 6202 are tilted to the left. However, the coordinatesystem 8630 b (and mapping between the real world and electronic device6202) may be maintained. This may be done based on tilt sensor data5608, for example. In the third image of FIG. 86, the user interface8628 c and/or electronic device 6202 are tilted to the right. However,the coordinate system 8630 c (and mapping between the real world andelectronic device 6202) may be maintained.

It should be noted that as used herein, the term “physical coordinates”may or may not denote geographic coordinates. In some configurations,for example, where the electronic device 6202 does not include acompass, the electronic device 6202 may still map coordinates from amulti-microphone configuration to physical coordinates based on sensordata 5608. In this case, the mapping 5612 may be relative to theelectronic device 6202 and may not directly correspond to earthcoordinates (e.g., north, south, east, west). Regardless, the electronicdevice 6202 may be able to discriminate the direction of sounds inphysical space relative to the electronic device 6202. In someconfigurations, however, the electronic device 6202 may include acompass (or other navigational instrument). In this case, the electronicdevice 6202 may map coordinates from a multi-microphone configuration tophysical coordinates that correspond to earth coordinates (e.g., north,south, east, west). Different types of coordinate systems 6230 may beutilized in accordance with the systems and methods disclosed herein.

FIG. 87 illustrates another example of the user interface 8728 with acoordinate system 8730 oriented independent of electronic device 6202orientation. In some implementations, the user interface 8728 may be anexample of the user interface 6228 described in connection with FIG. 62.The user interface 8728 may include a coordinate system 8730 and anaudio signal indicator 8746 that may be examples of correspondingelements described in connection with at least one of FIGS. 62 and 66.In some implementations, the user interface 8728 also includes a compass8756 in conjunction with a coordinate system 8730 (as described above).In this implementation, the compass 8756 may detect direction. Thecompass 8756 portion may display an electronic device 6202 orientationrelative to real world coordinates. Via the compass 8756, the sectorselection feature 6232 on the user interface 8728 may be relative tophysical space, and not the electronic device 6202. In other words, byadding a compass 8756, when the electronic device 6202 is selected froma vertical position to a horizontal position, “0 degrees” still remainsnear the top side of the electronic device 6202 that is facing the user.It should be noted that determining physical electronic device 6202orientation can be done with a compass 8756. However, if a compass 8756is not present, it also may be alternatively determined based on GPSand/or gyro sensors. Accordingly, any sensor 5604 or system that may beused to determine physical orientation of an electronic device 6202 maybe used alternatively from or in addition to a compass 8756. Thus, acompass 8756 may be substituted with another sensor 5604 or system inany of the configurations described herein. So, there are multiplesensors 5604 that can provide screenshots where the orientation remainsfixed relative to the user.

In the case where a GPS receiver is included in the electronic device6202, GPS data may be utilized to provide additional functionality (inaddition to just being a sensor). In some configurations, for example,the electronic device 6202 (e.g., mobile device) may include GPSfunctionality with map software. In one approach, the coordinate system8730 may be aligned such that zero degrees always points down a street,for example. With the compass 8756, for instance, the electronic device6202 (e.g., the coordinate system 8730) may be oriented according to aphysical north and/or south, whereas GPS functionality may be utilizedto provide more options.

FIG. 88 is a block diagram illustrating another configuration of a userinterface 8828 in which systems and methods for displaying a userinterface 8828 on an electronic device 8802 may be implemented. The userinterface 8828 may be an example of the user interface 6228 described inconnection with FIG. 62. In some implementations, the user interface8828 may be presented on a display 8864 of the electronic device 8802that may be examples of corresponding elements described in connectionwith FIG. 62. The user interface 8828 may include a coordinate system8830 and/or a sector selection feature 8832 that may be examples ofcorresponding elements described in connection with at least one ofFIGS. 62 and 66. The user interface 8828 may be coupled to at least onemicrophone 8806 and/or an operation block/module 8814 that may beexamples of corresponding elements described in connection with at leastone of FIGS. 56 and 66.

In some implementations, the user interface 8828 may be coupled to adatabase 8858 that may be included and/or coupled to the electronicdevice 8802. For example, the database 8858 may be stored in memorylocated on the electronic device 8802. The database 8858 may include oneor more audio signatures. For example, the database 8858 may include oneor more audio signatures pertaining to one or more audio signal sources(e.g., individual users). The database 8858 may also include informationbased on the audio signatures. For example, the database 8858 mayinclude identification information for the users that correspond to theaudio signatures. Identification information may include images of theaudio signal source (e.g., an image of a person corresponding to anaudio signature) and/or contact information, such as name, emailaddress, phone number, etc.

In some implementations, the user interface 8828 may include an audiosignature recognition block/module 8860. The audio signature recognitionblock/module 8860 may recognize audio signatures received by the atleast one microphone 8806. For example, the microphones 8806 may receivean audio signal. The audio signature recognition block/module 8860 mayobtain the audio signal and compare it to the audio signatures includedin the database 8858. In this example, the audio signature recognitionblock/module 8860 may obtain the audio signature and/or identificationinformation pertaining to the audio signature from the database 8858 andpass the identification information to the display 8864.

FIG. 89 is a flow diagram illustrating another configuration of a method8900 for displaying a user interface 8828 on an electronic device 8802.The method 8900 may be performed by the electronic device 8802. Theelectronic device 8802 may obtain 8902 a coordinate system 8830 thatcorresponds to physical coordinates. In some implementations, this maybe done as described in connection with FIG. 63.

The electronic device 8802 may present 8904 the user interface 8828 thatmay include the coordinate system 8830. In some implementations, thismay be done as described in connection with FIG. 63.

The electronic device 8802 may recognize 8906 an audio signature. Anaudio signature may be a characterization that corresponds to aparticular audio signal source. For example, an individual user may havean audio signature that corresponds to that individual's voice. Examplesof audio signatures include voice recognition parameters, audio signalcomponents, audio signal samples and/or other information forcharacterizing an audio signal. In some implementations, the electronicdevice 8802 may receive an audio signal from at least one microphone8806. The electronic device 8802 may then recognize 8906 the audiosignature, for example, by determining whether the audio signal is froman audio signal source such as an individual user, as compared to anoise signal. This may be done by measuring at least one characteristicof the audio signal, (e.g., harmonicity, pitch, etc.). In someimplementations, recognizing 8906 an audio signature may includeidentifying an audio signal as coming from a particular audio source.

The electronic device 8802 may then look up 8908 the audio signature inthe database 8858. For example, the electronic device 8802 may look forthe audio signature in the database 8858 of audio signatures. Theelectronic device 8802 may obtain 8910 identification informationcorresponding to the audio signature. As described above, the database8858 may include information based on the audio signatures. For example,the database 8858 may include identification information for the usersthat correspond to the audio signatures. Identification information mayinclude images of the audio signal source (e.g., the user) and/orcontact information, such as name, email address, phone number, etc.After obtaining 8910 the identification information (e.g., the image)corresponding to the audio signature, the electronic device 8802 maydisplay 8912 the identification information on the user interface 8828.For example, the electronic device 8802 may display 8912 an image of theuser next to the audio signal indicator 6646 on the display 6264. Inother implementations, the electronic device 8802 may display 8912 atleast one identification information element as part of anidentification display. For example, a portion of the user interface8828 may include the identification information (e.g., image, name,email address etc.) pertaining to the audio signature.

The electronic device 8802 may provide 8914 a sector selection feature6232 that allows selection of at least one sector of the coordinatesystem 8830. In some implementations, this may be done as described inconnection with FIG. 63.

FIG. 90 illustrates an example of the user interface 9028 coupled to thedatabase 9058. In some implementations, the user interface 9028 may bean example of the user interface 6228 described in connection with FIG.62. The user interface 9028 may include a coordinate system 9030 and anaudio signal indicator 9046 that may be examples of correspondingelements described in connection with at least one of FIGS. 62 and 66.As described above in some implementations, the user interface 9028 maybe coupled to the database 9058 that includes at least one audiosignature 9064 and/or identification information 9062 a corresponding tothe audio signature 9064 that may be examples of corresponding elementsdescribed in connection with at least one of FIGS. 88 and 89. In someconfigurations, the electronic device 6202 may recognize an audiosignature 9064 and look up the audio signature 9064 in the database9058. The electronic device 6202 may then obtain (e.g., retrieve) thecorresponding identification information 9062 a corresponding to theaudio signature 9064 recognized by the electronic device 6202. Forexample, the electronic device 6202 may obtain a picture of the speakeror person, and display the picture (and other identification information9062 b) of the speaker or person by the audio signal indicator 9046. Inthis way, a user can easily identify a source of an audio signal. Itshould be noted that the database 9058 can be local or can be remote(e.g., on a server across a network, such as a LAN or the Internet).Additionally or alternatively, the electronic device 6202 may send theidentification information 9062 to another device. For instance, theelectronic device 6202 may send one or more user names (and/or images,identifiers, etc.) to another device (e.g., smartphone, server, network,computer, etc.) that presents the identification information 9062 suchthat a far-end user is apprised of a current speaker. This may be usefulwhen there are multiple users talking on a speakerphone, for example.

Optionally, in some implementations, the user interface 9028 may displaythe identification information 9062 separate from the coordinate system9030. For example, the user interface 9028 may display theidentification information 9062 c below the coordinate system 9030.

FIG. 91 is a flow diagram illustrating another configuration of a method9100 for displaying a user interface 6428 on an electronic device 6402.The method 9100 may be performed by the electronic device 6402. Theelectronic device 6402 may obtain 9102 a coordinate system 6430 thatcorresponds to physical coordinates. In some implementations, this maybe done as described in connection with FIG. 63.

The electronic device 6402 may present 9104 the user interface 6428 thatmay include the coordinate system 6430. In some implementations, thismay be done as described in connection with FIG. 63.

The electronic device 6402 may provide 9106 a sector selection feature6432 that allows selection of at least one sector of the coordinatesystem 6430. In some implementations, this may be done as described inconnection with FIG. 63.

The electronic device 6402 may indicate 9108 image data from at leastone sector. As described above, the electronic device 6402 may includeat least one image sensor 6434. For example, several image sensors 6434that collect data relating to the electronic device 6402 may be includedon the electronic device 6402. More specifically, the at least one imagesensor 6434 may collect image data. For example, a camera (e.g., animage sensor 6434) may generate an image. In some implementations, theat least one image sensor 6434 may provide image data to the userinterface 6428. In some implementations, the electronic device 6402 mayindicate 9108 image data from the at least one image sensor 6434. Inother words, the electronic device 6402 may display image data (e.g.,still photo or video) from the at least one image sensor 6434 on thedisplay 6464.

In some implementations, the electronic device 6402 may pass 9110 imagedata based on the at least one sector. For example, the electronicdevice 6402 may pass 9110 image data indicated in a selected sector. Inother words, at least one of the techniques described herein regardingthe user interface 6428 may be applied to image data alternatively fromor in addition to audio signals.

FIG. 92 is a block diagram illustrating one configuration of a wirelesscommunication device 9266 which systems and methods for mapping a sourcelocation may be implemented. The wireless communication device 9266illustrated in FIG. 92 may be an example of at least one of theelectronic devices described herein. The wireless communication device9266 may include an application processor 9278. The applicationprocessor 9278 generally processes instructions (e.g., runs programs) toperform functions on the wireless communication device 9266. Theapplication processor 9278 may be coupled to an audio coder/decoder(codec) 9276.

The audio codec 9276 may be an electronic device (e.g., integratedcircuit) used for coding and/or decoding audio signals. The audio codec9276 may be coupled to at least one speaker 9268, an earpiece 9270, anoutput jack 9272 and/or at least one microphone 9206. The speakers 9268may include one or more electro-acoustic transducers that convertelectrical or electronic signals into acoustic signals. For example, thespeakers 9268 may be used to play music or output a speakerphoneconversation, etc. The earpiece 9270 may be another speaker orelectro-acoustic transducer that can be used to output acoustic signals(e.g., speech signals) to a user. For example, the earpiece 9270 may beused such that only a user may reliably hear the acoustic signal. Theoutput jack 9272 may be used for coupling other devices to the wirelesscommunication device 9266 for outputting audio, such as headphones. Thespeakers 9268, earpiece 9270 and/or output jack 9272 may generally beused for outputting an audio signal from the audio codec 9276. The atleast one microphone 9206 may be an acousto-electric transducer thatconverts an acoustic signal (such as a user's voice) into electrical orelectronic signals that are provided to the audio codec 9276.

A coordinate mapping block/module 9217 a may be optionally implementedas part of the audio codec 9276. For example, the coordinate mappingblock/module 9217 a may be implemented in accordance with one or more ofthe functions and/or structures described herein. For example, thecoordinate mapping block/module 9217 a may be implemented in accordancewith one or more of the functions and/or structures described inconnection with FIGS. 57, 59, 60 and 61.

Additionally or alternatively, a coordinate mapping block/module 9217 bmay be implemented in the application processor 9278. For example, thecoordinate mapping block/module 9217 b may be implemented in accordancewith one or more of the functions and/or structures described herein.For example, the coordinate mapping block/module 9217 b may beimplemented in accordance with one or more of the functions and/orstructures described in connection with FIGS. 57, 59, 60 and 61.

The application processor 9278 may also be coupled to a power managementcircuit 9280. One example of a power management circuit 9280 is a powermanagement integrated circuit (PMIC), which may be used to manage theelectrical power consumption of the wireless communication device 9266.The power management circuit 9280 may be coupled to a battery 9282. Thebattery 9282 may generally provide electrical power to the wirelesscommunication device 9266. For example, the battery 9282 and/or thepower management circuit 9280 may be coupled to at least one of theelements included in the wireless communication device 9266.

The application processor 9278 may be coupled to at least one inputdevice 9286 for receiving input. Examples of input devices 9286 includeinfrared sensors, image sensors, accelerometers, touch sensors, keypads,etc. The input devices 9286 may allow user interaction with the wirelesscommunication device 9266. The application processor 9278 may also becoupled to one or more output devices 9284. Examples of output devices9284 include printers, projectors, screens, haptic devices, etc. Theoutput devices 9284 may allow the wireless communication device 9266 toproduce output that may be experienced by a user.

The application processor 9278 may be coupled to application memory9288. The application memory 9288 may be any electronic device that iscapable of storing electronic information. Examples of applicationmemory 9288 include double data rate synchronous dynamic random accessmemory (DDRAM), synchronous dynamic random access memory (SDRAM), flashmemory, etc. The application memory 9288 may provide storage for theapplication processor 9278. For instance, the application memory 9288may store data and/or instructions for the functioning of programs thatare run on the application processor 9278.

The application processor 9278 may be coupled to a display controller9290, which in turn may be coupled to a display 9292. The displaycontroller 9290 may be a hardware block that is used to generate imageson the display 9292. For example, the display controller 9290 maytranslate instructions and/or data from the application processor 9278into images that can be presented on the display 9292. Examples of thedisplay 9292 include liquid crystal display (LCD) panels, light emittingdiode (LED) panels, cathode ray tube (CRT) displays, plasma displays,etc.

The application processor 9278 may be coupled to a baseband processor9294. The baseband processor 9294 generally processes communicationsignals. For example, the baseband processor 9294 may demodulate and/ordecode received signals. Additionally or alternatively, the basebandprocessor 9294 may encode and/or modulate signals in preparation fortransmission.

The baseband processor 9294 may be coupled to baseband memory 9296. Thebaseband memory 9296 may be any electronic device capable of storingelectronic information, such as SDRAM, DDRAM, flash memory, etc. Thebaseband processor 9294 may read information (e.g., instructions and/ordata) from and/or write information to the baseband memory 9296.Additionally or alternatively, the baseband processor 9294 may useinstructions and/or data stored in the baseband memory 9296 to performcommunication operations.

The baseband processor 9294 may be coupled to a radio frequency (RF)transceiver 9298. The RF transceiver 9298 may be coupled to a poweramplifier 9201 and one or more antennas 9203. The RF transceiver 9298may transmit and/or receive radio frequency signals. For example, the RFtransceiver 9298 may transmit an RF signal using a power amplifier 9201and at least one antenna 9203. The RF transceiver 9298 may also receiveRF signals using the one or more antennas 9203.

FIG. 93 illustrates various components that may be utilized in anelectronic device 9302. The illustrated components may be located withinthe same physical structure or in separate housings or structures. Theelectronic device 9302 described in connection with FIG. 93 may beimplemented in accordance with at least one of the electronic devicesand the wireless communication device described herein. The electronicdevice 9302 includes a processor 9311. The processor 9311 may be ageneral purpose single- or multi-chip microprocessor (e.g., an ARM), aspecial purpose microprocessor (e.g., a digital signal processor (DSP)),a microcontroller, a programmable gate array, etc. The processor 9311may be referred to as a central processing unit (CPU). Although just asingle processor 9311 is shown in the electronic device 9302 of FIG. 93,in an alternative configuration, a combination of processors (e.g., anARM and DSP) could be used.

The electronic device 9302 also includes memory 9305 in electroniccommunication with the processor 9311. That is, the processor 9311 canread information from and/or write information to the memory 9305. Thememory 9305 may be any electronic component capable of storingelectronic information. The memory 9305 may be random access memory(RAM), read-only memory (ROM), magnetic disk storage media, opticalstorage media, flash memory devices in RAM, on-board memory includedwith the processor, programmable read-only memory (PROM), erasableprogrammable read-only memory (EPROM), electrically erasable PROM(EEPROM), registers, and so forth, including combinations thereof.

Data 9309 a and instructions 9307 a may be stored in the memory 9305.The instructions 9307 a may include at least one program, routine,sub-routine, function, procedure, etc. The instructions 9307 a mayinclude a single computer-readable statement or many computer-readablestatements. The instructions 9307 a may be executable by the processor9311 to implement at least one of the methods described above. Executingthe instructions 9307 a may involve the use of the data 9309 a that isstored in the memory 9305. FIG. 93 shows some instructions 9307 b anddata 9309 b being loaded into the processor 9311 (which may come frominstructions 9307 a and data 9309 a).

The electronic device 9302 may also include at least one communicationinterface 9313 for communicating with other electronic devices. Thecommunication interface 9313 may be based on wired communicationtechnology, wireless communication technology, or both. Examples ofdifferent types of communication interfaces 9313 include a serial port,a parallel port, a Universal Serial Bus (USB), an Ethernet adapter, anIEEE 1394 bus interface, a small computer system interface (SCSI) businterface, an infrared (IR) communication port, a Bluetooth wirelesscommunication adapter, and so forth.

The electronic device 9302 may also include at least one input device9386 and at least one output device 9384. Examples of different kinds ofinput devices 9386 include a keyboard, mouse, microphone, remote controldevice, button, joystick, trackball, touchpad, lightpen, etc. Forinstance, the electronic device 9302 may include at least one microphone9306 for capturing acoustic signals. In one configuration, a microphone9306 may be a transducer that converts acoustic signals (e.g., voice,speech) into electrical or electronic signals. Examples of differentkinds of output devices 9384 include a speaker, printer, etc. Forinstance, the electronic device 9302 may include at least one speaker9368. In one configuration, a speaker 9368 may be a transducer thatconverts electrical or electronic signals into acoustic signals. Onespecific type of output device that may be typically included in anelectronic device 9302 is a display device 9392. Display devices 9392used with configurations disclosed herein may utilize any suitable imageprojection technology, such as a cathode ray tube (CRT), liquid crystaldisplay (LCD), light-emitting diode (LED), gas plasma,electroluminescence, or the like. A display controller 9390 may also beprovided for converting data stored in the memory 9305 into text,graphics, and/or moving images (as appropriate) shown on the displaydevice 9392.

The various components of the electronic device 9302 may be coupledtogether by at least one bus, which may include a power bus, a controlsignal bus, a status signal bus, a data bus, etc. For simplicity, thevarious buses are illustrated in FIG. 93 as a bus system 9315. It shouldbe noted that FIG. 93 illustrates only one possible configuration of anelectronic device 9302. Various other architectures and components maybe utilized.

Some Figures illustrating examples of functionality and/or of the userinterface as described herein are given hereafter. In someconfigurations, the functionality and/or user interface may be referredto in connection with the phrase “Sound Focus and Source Tracking,”“SoFAST” or “SFAST.”

In the above description, reference numbers have sometimes been used inconnection with various terms. Where a term is used in connection with areference number, this may be meant to refer to a specific element thatis shown in at least one of the Figures. Where a term is used without areference number, this may be meant to refer generally to the termwithout limitation to any particular Figure.

The term “couple” and any variations thereof may indicate a direct orindirect connection between elements. For example, a first elementcoupled to a second element may be directly connected to the secondelement, or indirectly connected to the second element through anotherelement.

The term “processor” should be interpreted broadly to encompass ageneral purpose processor, a central processing unit (CPU), amicroprocessor, a digital signal processor (DSP), a controller, amicrocontroller, a state machine, and so forth. Under somecircumstances, a “processor” may refer to an application specificintegrated circuit (ASIC), a programmable logic device (PLD), a fieldprogrammable gate array (FPGA), etc. The term “processor” may refer to acombination of processing devices, e.g., a combination of a digitalsignal processor (DSP) and a microprocessor, a plurality ofmicroprocessors, at least one microprocessor in conjunction with adigital signal processor (DSP) core, or any other such configuration.

The term “memory” should be interpreted broadly to encompass anyelectronic component capable of storing electronic information. The termmemory may refer to various types of processor-readable media such asrandom access memory (RAM), read-only memory (ROM), non-volatile randomaccess memory (NVRAM), programmable read-only memory (PROM), erasableprogrammable read-only memory (EPROM), electrically erasable PROM(EEPROM), flash memory, magnetic or optical data storage, registers,etc. Memory is said to be in electronic communication with a processorif the processor can read information from and/or write information tothe memory. Memory that is integral to a processor is in electroniccommunication with the processor.

The terms “instructions” and “code” should be interpreted broadly toinclude any type of computer-readable statement(s). For example, theterms “instructions” and “code” may refer to at least one programs,routines, sub-routines, functions, procedures, etc. “Instructions” and“code” may comprise a single computer-readable statement or manycomputer-readable statements.

It should be noted that at least one of the features, functions,procedures, components, elements, structures, etc., described inconnection with any one of the configurations described herein may becombined with at least one of the functions, procedures, components,elements, structures, etc., described in connection with any of theother configurations described herein, where compatible. In other words,any compatible combination of the functions, procedures, components,elements, etc., described herein may be implemented in accordance withthe systems and methods disclosed herein.

The methods and apparatus disclosed herein may be applied generally inany transceiving and/or audio sensing application, especially mobile orotherwise portable instances of such applications. For example, therange of configurations disclosed herein includes communications devicesthat reside in a wireless telephony communication system configured toemploy a code-division multiple-access (CDMA) over-the-air interface.Nevertheless, it would be understood by those skilled in the art that amethod and apparatus having features as described herein may reside inany of the various communication systems employing a wide range oftechnologies known to those of skill in the art, such as systemsemploying Voice over IP (VoIP) over wired and/or wireless (e.g., CDMA,time division multiple access (TDMA), frequency division multiple access(FDMA), and/or time division synchronous code division multiple access(TDSCDMA)) transmission channels.

It is expressly contemplated and hereby disclosed that communicationsdevices disclosed herein may be adapted for use in networks that arepacket-switched (for example, wired and/or wireless networks arranged tocarry audio transmissions according to protocols such as VoIP) and/orcircuit-switched. It is also expressly contemplated and hereby disclosedthat communications devices disclosed herein may be adapted for use innarrowband coding systems (e.g., systems that encode an audio frequencyrange of about four or five kilohertz) and/or for use in wideband codingsystems (e.g., systems that encode audio frequencies greater than fivekilohertz), including whole-band wideband coding systems and split-bandwideband coding systems.

Examples of codecs that may be used with, or adapted for use with,transmitters and/or receivers of communications devices as describedherein include the Enhanced Variable Rate Codec, as described in theThird Generation Partnership Project 2 (3GPP2) document C.S0014-C, v1.0,titled “Enhanced Variable Rate Codec, Speech Service Options 3, 68, and70 for Wideband Spread Spectrum Digital Systems,” February 2007(available online at www.3gpp.org); the Selectable Mode Vocoder speechcodec, as described in the 3GPP2 document C.S0030-0, v3.0, titled“Selectable Mode Vocoder (SMV) Service Option for Wideband SpreadSpectrum Communication Systems,” January 2004 (available online atwww.3gpp.org); the Adaptive Multi Rate (AMR) speech codec, as describedin the document ETSI TS 126 092 V6.0.0 (European TelecommunicationsStandards Institute (ETSI), Sophia Antipolis Cedex, FR, December 2004);and the AMR Wideband speech codec, as described in the document ETSI TS126 192 V6.0.0 (ETSI, December 2004). Such a codec may be used, forexample, to recover the reproduced audio signal from a received wirelesscommunications signal.

The presentation of the described configurations is provided to enableany person skilled in the art to make or use the methods and otherstructures disclosed herein. The flowcharts, block diagrams and otherstructures shown and described herein are examples only, and othervariants of these structures are also within the scope of thedisclosure. Various modifications to these configurations are possible,and the generic principles presented herein may be applied to otherconfigurations as well. Thus, the present disclosure is not intended tobe limited to the configurations shown above but rather is to beaccorded the widest scope consistent with the principles and novelfeatures disclosed in any fashion herein, including in the attachedclaims as filed, which form a part of the original disclosure.

Those of skill in the art will understand that information and signalsmay be represented using any of a variety of different technologies andtechniques. For example, data, instructions, commands, information,signals, bits and symbols that may be referenced throughout the abovedescription may be represented by voltages, currents, electromagneticwaves, magnetic fields or particles, optical fields or particles, or anycombination thereof.

Important design requirements for implementation of a configuration asdisclosed herein may include minimizing processing delay and/orcomputational complexity (typically measured in millions of instructionsper second or MIPS), especially for computation-intensive applications,such as playback of compressed audio or audiovisual information (e.g., afile or stream encoded according to a compression format, such as one ofthe examples identified herein) or applications for widebandcommunications (e.g., voice communications at sampling rates higher thaneight kilohertz, such as 12, 16, 32, 44.1, 48, or 192 kHz).

An apparatus as disclosed herein (e.g., any device configured to performa technique as described herein) may be implemented in any combinationof hardware with software, and/or with firmware, that is deemed suitablefor the intended application. For example, the elements of such anapparatus may be fabricated as electronic and/or optical devicesresiding, for example, on the same chip or among two or more chips in achipset. One example of such a device is a fixed or programmable arrayof logic elements, such as transistors or logic gates, and any of theseelements may be implemented as one or more such arrays. Any two or more,or even all, of these elements may be implemented within the same arrayor arrays. Such an array or arrays may be implemented within one or morechips (for example, within a chipset including two or more chips).

One or more elements of the various implementations of the apparatusdisclosed herein may be implemented in whole or in part as one or moresets of instructions arranged to execute on one or more fixed orprogrammable arrays of logic elements, such as microprocessors, embeddedprocessors, intellectual property (IP) cores, digital signal processors,FPGAs (field-programmable gate arrays), ASSPs (application-specificstandard products), and ASICs (application-specific integratedcircuits). Any of the various elements of an implementation of anapparatus as disclosed herein may also be embodied as one or morecomputers (e.g., machines including one or more arrays programmed toexecute one or more sets or sequences of instructions, also called“processors”), and any two or more, or even all, of these elements maybe implemented within the same such computer or computers.

A processor or other means for processing as disclosed herein may befabricated as one or more electronic and/or optical devices residing,for example, on the same chip or among two or more chips in a chipset.One example of such a device is a fixed or programmable array of logicelements, such as transistors or logic gates, and any of these elementsmay be implemented as one or more such arrays. Such an array or arraysmay be implemented within one or more chips (for example, within achipset including two or more chips). Examples of such arrays includefixed or programmable arrays of logic elements, such as microprocessors,embedded processors, IP cores, DSPs, FPGAs, ASSPs and ASICs. A processoror other means for processing as disclosed herein may also be embodiedas one or more computers (e.g., machines including one or more arraysprogrammed to execute one or more sets or sequences of instructions) orother processors. It is possible for a processor as described herein tobe used to perform tasks or execute other sets of instructions that arenot directly related to a procedure of an implementation of a method asdisclosed herein, such as a task relating to another operation of adevice or system in which the processor is embedded (e.g., an audiosensing device). It is also possible for part of a method as disclosedherein to be performed by a processor of the audio sensing device andfor another part of the method to be performed under the control of oneor more other processors.

Those of skill will appreciate that the various illustrative modules,logical blocks, circuits, and tests and other operations described inconnection with the configurations disclosed herein may be implementedas electronic hardware, computer software, or combinations of both. Suchmodules, logical blocks, circuits, and operations may be implemented orperformed with a general purpose processor, a digital signal processor(DSP), an ASIC or ASSP, an FPGA or other programmable logic device,discrete gate or transistor logic, discrete hardware components, or anycombination thereof designed to produce the configuration as disclosedherein. For example, such a configuration may be implemented at least inpart as a hard-wired circuit, as a circuit configuration fabricated intoan application-specific integrated circuit, or as a firmware programloaded into non-volatile storage or a software program loaded from orinto a data storage medium as machine-readable code, such code beinginstructions executable by an array of logic elements such as a generalpurpose processor or other digital signal processing unit. A generalpurpose processor may be a microprocessor, but in the alternative, theprocessor may be any conventional processor, controller,microcontroller, or state machine. A processor may also be implementedas a combination of computing devices, e.g., a combination of a DSP anda microprocessor, a plurality of microprocessors, one or moremicroprocessors in conjunction with a DSP core, or any other suchconfiguration. A software module may reside in a non-transitory storagemedium such as RAM (random-access memory), ROM (read-only memory),nonvolatile RAM (NVRAM) such as flash RAM, erasable programmable ROM(EPROM), electrically erasable programmable ROM (EEPROM), registers,hard disk, a removable disk, or a CD-ROM; or in any other form ofstorage medium known in the art. An illustrative storage medium iscoupled to the processor such the processor can read information from,and write information to, the storage medium. In the alternative, thestorage medium may be integral to the processor. The processor and thestorage medium may reside in an ASIC. The ASIC may reside in a userterminal. In the alternative, the processor and the storage medium mayreside as discrete components in a user terminal. The term“computer-program product” refers to a computing device or processor incombination with code or instructions (e.g., a “program”) that may beexecuted, processed or computed by the computing device or processor.

It is noted that the various methods disclosed herein may be performedby an array of logic elements such as a processor, and that the variouselements of an apparatus as described herein may be implemented asmodules designed to execute on such an array. As used herein, the term“module” or “sub-module” can refer to any method, apparatus, device,unit or computer-readable data storage medium that includes computerinstructions (e.g., logical expressions) in software, hardware orfirmware form. It is to be understood that multiple modules or systemscan be combined into one module or system and one module or system canbe separated into multiple modules or systems to perform the samefunctions. When implemented in software or other computer-executableinstructions, the elements of a process are essentially the codesegments to perform the related tasks, such as with routines, programs,objects, components, data structures, and the like. The term “software”should be understood to include source code, assembly language code,machine code, binary code, firmware, macrocode, microcode, any one ormore sets or sequences of instructions executable by an array of logicelements, and any combination of such examples. The program or codesegments can be stored in a processor readable medium or transmitted bya computer data signal embodied in a carrier wave over a transmissionmedium or communication link.

The implementations of methods, schemes, and techniques disclosed hereinmay also be tangibly embodied (for example, in tangible,computer-readable features of one or more computer-readable storagemedia as listed herein) as one or more sets of instructions executableby a machine including an array of logic elements (e.g., a processor,microprocessor, microcontroller, or other finite state machine). Theterm “computer-readable medium” may include any medium that can store ortransfer information, including volatile, nonvolatile, removable, andnon-removable storage media. Examples of a computer-readable mediuminclude an electronic circuit, a semiconductor memory device, a ROM, aflash memory, an erasable ROM (EROM), a floppy diskette or othermagnetic storage, a CD-ROM/DVD or other optical storage, a hard disk orany other medium which can be used to store the desired information, afiber optic medium, a radio frequency (RF) link, or any other mediumwhich can be used to carry the desired information and can be accessed.The computer data signal may include any signal that can propagate overa transmission medium such as electronic network channels, opticalfibers, air, electromagnetic, RF links, etc. The code segments may bedownloaded via computer networks such as the Internet or an intranet. Inany case, the scope of the present disclosure should not be construed aslimited by such embodiments. Each of the tasks of the methods describedherein may be embodied directly in hardware, in a software moduleexecuted by a processor, or in a combination of the two. In a typicalapplication of an implementation of a method as disclosed herein, anarray of logic elements (e.g., logic gates) is configured to performone, more than one, or even all of the various tasks of the method. Oneor more (possibly all) of the tasks may also be implemented as code(e.g., one or more sets of instructions), embodied in a computer programproduct (e.g., one or more data storage media such as disks, flash orother nonvolatile memory cards, semiconductor memory chips, etc.), thatis readable and/or executable by a machine (e.g., a computer) includingan array of logic elements (e.g., a processor, microprocessor,microcontroller, or other finite state machine). The tasks of animplementation of a method as disclosed herein may also be performed bymore than one such array or machine. In these or other implementations,the tasks may be performed within a device for wireless communicationssuch as a cellular telephone or other device having such communicationscapability. Such a device may be configured to communicate withcircuit-switched and/or packet-switched networks (e.g., using one ormore protocols such as VoIP). For example, such a device may include RFcircuitry configured to receive and/or transmit encoded frames.

It is expressly disclosed that the various methods disclosed herein maybe performed by a portable communications device such as a handset,headset, or portable digital assistant (PDA), and that the variousapparatus described herein may be included within such a device. Atypical real-time (e.g., online) application is a telephone conversationconducted using such a mobile device.

In one or more exemplary embodiments, the operations described hereinmay be implemented in hardware, software, firmware, or any combinationthereof. If implemented in software, such operations may be stored on ortransmitted over a computer-readable medium as one or more instructionsor code. The term “computer-readable media” includes bothcomputer-readable storage media and communication (e.g., transmission)media. By way of example, and not limitation, computer-readable storagemedia can comprise an array of storage elements, such as semiconductormemory (which may include without limitation dynamic or static RAM, ROM,EEPROM, and/or flash RAM), or ferroelectric, magnetoresistive, ovonic,polymeric, or phase-change memory; CD-ROM or other optical disk storage;and/or magnetic disk storage or other magnetic storage devices. Suchstorage media may store information in the form of instructions or datastructures that can be accessed by a computer. Communication media cancomprise any medium that can be used to carry desired program code inthe form of instructions or data structures and that can be accessed bya computer, including any medium that facilitates transfer of a computerprogram from one place to another. Also, any connection is properlytermed a computer-readable medium. For example, if the software istransmitted from a website, server, or other remote source using acoaxial cable, fiber optic cable, twisted pair, digital subscriber line(DSL), or wireless technology such as infrared, radio, and/or microwave,then the coaxial cable, fiber optic cable, twisted pair, DSL, orwireless technology such as infrared, radio, and/or microwave areincluded in the definition of medium. Disk and disc, as used herein,includes compact disc (CD), laser disc, optical disc, digital versatiledisc (DVD), floppy disk and Blu-ray Disc™ (Blu-Ray Disc Association,Universal City, Calif.), where disks usually reproduce datamagnetically, while discs reproduce data optically with lasers.Combinations of the above should also be included within the scope ofcomputer-readable media.

An acoustic signal processing apparatus as described herein may beincorporated into an electronic device that accepts speech input inorder to control certain operations, or may otherwise benefit fromseparation of desired noises from background noises, such ascommunications devices. Many applications may benefit from enhancing orseparating clear desired sound from background sounds originating frommultiple directions. Such applications may include human-machineinterfaces in electronic or computing devices that incorporatecapabilities such as voice recognition and detection, speech enhancementand separation, voice-activated control, and the like. It may bedesirable to implement such an acoustic signal processing apparatus tobe suitable in devices that only provide limited processingcapabilities.

The elements of the various implementations of the modules, elements,and devices described herein may be fabricated as electronic and/oroptical devices residing, for example, on the same chip or among two ormore chips in a chipset. One example of such a device is a fixed orprogrammable array of logic elements, such as transistors or gates. Oneor more elements of the various implementations of the apparatusdescribed herein may also be implemented in whole or in part as one ormore sets of instructions arranged to execute on one or more fixed orprogrammable arrays of logic elements such as microprocessors, embeddedprocessors, IP cores, digital signal processors, FPGAs, ASSPs, andASICs.

It is possible for one or more elements of an implementation of anapparatus as described herein to be used to perform tasks or executeother sets of instructions that are not directly related to an operationof the apparatus, such as a task relating to another operation of adevice or system in which the apparatus is embedded. It is also possiblefor one or more elements of an implementation of such an apparatus tohave structure in common (e.g., a processor used to execute portions ofcode corresponding to different elements at different times, a set ofinstructions executed to perform tasks corresponding to differentelements at different times, or an arrangement of electronic and/oroptical devices performing operations for different elements atdifferent times).

It is to be understood that the claims are not limited to the preciseconfiguration and components illustrated above. Various modifications,changes and variations may be made in the arrangement, operation anddetails of the systems, methods, and apparatus described herein withoutdeparting from the scope of the claims.

What is claimed is:
 1. A method for displaying a user interface on anelectronic device, comprising: presenting a user interface, wherein theuser interface comprises a coordinate system, wherein the coordinatesystem corresponds to physical coordinates based on sensor data; anddisplaying at least a target audio signal and an interfering audiosignal on the user interface.
 2. The method of claim 1, furthercomprising displaying a directionality of at least one of the targetaudio signal and the interfering audio signal captured by at least onemicrophone.
 3. The method of claim 2, wherein the target audio signalcomprises a voice signal.
 4. The method of claim 2, further comprisingdisplaying at least one icon corresponding to at least one of the targetaudio signal and the interfering audio signal.
 5. The method of claim 1,further comprising passing the target audio signal.
 6. The method ofclaim 1, further comprising attenuating the interfering audio signal. 7.The method of claim 1, further comprising aligning at least a part ofthe user interface with a reference plane.
 8. The method of claim 7,wherein the reference plane is horizontal.
 9. The method of claim 7,wherein aligning at least a part of the user interface comprises mappinga two-dimensional polar plot into a three-dimensional display space. 10.The method of claim 1, wherein the physical coordinates are earthcoordinates.
 11. The method of claim 1, wherein the coordinate systemmaintains an orientation independent of electronic device orientation.12. The method of claim 1, further comprising: recognizing an audiosignature; looking up the audio signature in a database; obtainingidentification information corresponding to the audio signature; anddisplaying the identification information on the user interface.
 13. Themethod of claim 12, wherein the identification information is an imageof a person corresponding to the audio signature.
 14. An electronicdevice, comprising: a display, wherein the display presents a userinterface, wherein the user interface comprises a coordinate system,wherein the coordinate system corresponds to physical coordinates basedon sensor data; and the display displays at least a target audio signaland an interfering audio signal on the user interface.
 15. Theelectronic device of claim 14, wherein the display displays adirectionality of at least one of the target audio signal and theinterfering audio signal captured by at least one microphone.
 16. Theelectronic device of claim 15, wherein the target audio signal comprisesa voice signal.
 17. The electronic device of claim 15, wherein thedisplay displays at least one icon corresponding to at least one of thetarget audio signal and the interfering audio signal.
 18. The electronicdevice of claim 14, further comprising operation circuitry coupled tothe display, wherein the operation circuitry passes the target audiosignal.
 19. The electronic device of claim 14, further comprisingoperation circuitry coupled to the display, wherein the operationcircuitry attenuates the interfering audio signal.
 20. The electronicdevice of claim 14, wherein the user interface aligns at least a part ofthe user interface with a reference plane.
 21. The electronic device ofclaim 20, wherein the reference plane is horizontal.
 22. The electronicdevice of claim 20, wherein aligning at least a part of the userinterface comprises mapping a two-dimensional polar plot into athree-dimensional display space.
 23. The electronic device of claim 14,wherein the physical coordinates are earth coordinates.
 24. Theelectronic device of claim 14, wherein the coordinate system maintainsan orientation independent of electronic device orientation.
 25. Theelectronic device of claim 14, further comprising audio signaturerecognition circuitry that recognizes an audio signature, looks up theaudio signature in a database, obtains identification informationcorresponding to the audio signature, and passes the identificationinformation to the display.
 26. The electronic device of claim 25,wherein the identification information is an image of a personcorresponding to the audio signature.
 27. A computer-program product fordisplaying a user interface, comprising a non-transitory tangiblecomputer-readable medium having instructions thereon, the instructionscomprising: code for causing an electronic device to present a userinterface, wherein the user interface comprises a coordinate system,wherein the coordinate system corresponds to physical coordinates basedon sensor data; and code for causing the electronic device to display atleast a target audio signal and an interfering audio signal on the userinterface.
 28. The computer-program product of claim 27, wherein theinstructions further comprise code for causing the electronic device todisplay a directionality of at least one of the target audio signal andthe interfering audio signal captured by at least one microphone. 29.The computer-program product of claim 27, wherein the instructionsfurther comprise code for causing the electronic device to pass thetarget audio signal.
 30. The computer-program product of claim 27,wherein the instructions further comprise code for causing theelectronic device to attenuate the interfering audio signal.
 31. Anapparatus for displaying a user interface, comprising: means forpresenting a user interface, wherein the user interface comprises acoordinate system, wherein the coordinate system corresponds to physicalcoordinates based on sensor data; and means for displaying at least atarget audio signal and an interfering audio signal on the userinterface.
 32. The apparatus of claim 31, further comprising means fordisplaying a directionality of at least one of the target audio signaland the interfering audio signal captured by at least one microphone.33. The apparatus of claim 31, further comprising means for passing thetarget audio signal.
 34. The apparatus of claim 31, further comprisingmeans for attenuating the interfering audio signal.